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1 Introduction 


This report presents the current design for Cronus, the 
system being developed under the Distributed Operating System 
Design and Implementation project sponsored by Rome Air 
Development Center(1). It is intended as an overview of the 
system structure and as a synopsis of the current 
system/subsystem decomposition and specification. 


This report 1s @ revision to two earlier drafts, BBN Report 
No. 5260, November 1982, and BBN Report No. 5646, May 1984. A 
previous report. "Cronus, A Distributed Operating System. 
Functional] Definition and System Concept", BBN Report No. 5884 is 
intended as & companion to the current report, and the reader 15s 
assumed to be familiar with its contents. 


In Section 2, we briefly review @ few of the areas covered 
in the Functional Definition. and extend them to cover current 
development plans. 


Section 3 presents an overview of the Cronus operating 
system. stressing the common framework into which its components 
will fit and the functional decomposition of the system. 


Sections 4 through 12 present the design for the various 
system functions. In a number of areas the design 1s only 
partially complete. These sections will form the basis of 4 
continuing and evolving subsystem specification for the various 
components, throughout the life of the project. 


Section 13 sketches how the system supports some common 
functions. Other Sections contain a description of the system 
environment, including hardware, Virtual Local Network, GCE 
software, and system utilities and libraries. 





(1). This work is being performed under RADC contract No. 
F30602-81-C-0132 
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In previous versions of this document, detailed descriptions WN? 
of the commands and functions in this system, as well as of the @ : 
objects, operations, and formats used in the decomposition. were Bi te SN 
included. Much of this materia] 1S more appropriate to the ro mt 
Cronus User's Manual. Many details which were included in the x it 
earlier versions of the System/Subsystem Description have been 
removed from this report to the User’s Manual. In addition, 
detailed design notes made during the implemention of the system 
are included there. Cross references to this document appear 
throughout the System/’Subsystem, Description. These are of the 
form (see Cronus User’s Manual topic(number)); where topic is the 
name of @ page in the manua), and pumber 1s the section number 
within the Cronus User's Manual where one mav find the page. 


Many people. in addition to the current Cronus project 
development staff listed as authors of this report. have 
contributed both ideas and enthusiastic effort in designing and 
constructing the system described. These people include William 
MacGregor. Benjamin Woznick, David Mankins, Robert Walsh, Ed 
Burke, Steve Toner, Mort Hoffman and Steve Geyer. 
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2 Cronus Project Overview 
2.1 Project Objectives 


Sate -» The objective of the Cronus project 1s to develop a testbed 
for evaluating distributed system technology. To do this we are 
establishing a prototype local area network based hardware 
architecture. and building an operating system and software 
architecture to organize and control this distributed system. 

The architecture was partially specified by the statement of 
work, and further defined during early stages of the project.) It 
is described in the Cronus Functional Description [BBN 5041]. and 
1s summarized in Section 2.4.%In addition to establishing a 
system architecture. the other major aspects of the Cronus 
project activities are.> 


ce 
a 


Ca) Select off-the-shelf hardware and software components as 
Ya basis for an Advanced Development Mode] (ADM) prototype 
configuration for the distributed system Geeteed 


2) ,Desien the system) 
3) Implement @ version of the basic system components ) Me ste 


4) _Jest and evaluate the concepts and realization of the DOS 
+ : . 
in the Advanced Development Mode}, eee fen ee ae 
The orientation we have chosen 1s both experimental through 
construction of working system components, and evolutionary 
through pre-planned continuation of design and development 
activities. 


2.2 Points of Emphasis : \ 


The Cronus design 1s intended to introduce @ coherence and 
uniformity to a set of otherwise independent and disjoint 
computer systems. This grouping of machines. operating under the 
control of a distributed operating system, 1s called a Cronus 
cluster. The aim is to provide for the cluster configuration as 
a whole features comparable to those found in @ modern 
centralized computer utility There are various ways of viewing 
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this uniformity and coherence, each plays a role in the Cronus 
design. 


From an end user's point of view, the Cronus DOS provides a 
single account with access to all integrated svstem services, a 
uniform distributed filing system and a uniform program execution 
facility. which is independent of the site of the activity. From 
& programmer's point of view, Cronus provides a uniform interface 
and access path to the distributed system resources, and supports 
the initiation and contro] of distributed computations. More 
importantly. from both an end user’s and programmer's 
perspective. Cronus provides & common system framework for 
applications. This means that otherwise independent computerized 
activities can be constructed so that they are more easily made 
to work together, despite implementations which cross host and 
processor-type boundaries. 


From an operations and administrative perspective Cronus 
provides 4 logically centralized facility for monitoring and 
controlling all of the connected systems. Functions such as 
account authorization. user priority. and access control can be 
applied system-wide rather than individually to each host. 


In addition to coherence and uniformity, there are a number 


of other system design goals. These are. 
fe) Survivability and integrity of Cronus itself. 
fe) Scalability to accommodate both small] and large 


configurations. 


oO Experimentation with resource management strategies that 
effect global performance, 


° Component substitutability to allow easy use of alternate 
functionally equivalent hardware, and 


oO Convenient operation and maintenance procedures. 
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2.3 System Phases 


System development consists of three phases. The first 
phase, coincident with the development of the functional 
definition. included component selection, installation, 


interconnection and testing. The second phase includes the 
design and implementation of the basic system that will provide 
the uniformity and coherency to the collection of machines. It 


also provides the framework for the in-depth design, 
implementation, and experimentation in the other areas of 
interest (e.g. survivability), which are to occur as the third 
phase. The second phase design 1s the principal subject of the 
remaining sections of this report. In certain areas, elements 
the third phase design are sketched as wel} 


2.4 The Cronus Hardware Architecture 
2.4.1 System Environment 


The Cronus environment consists of several parts: the local 
area network which provides the communications substrate for a 
Cronus cluster, the set of hosts upon which the Cronus system 
operates, and @ mechanism for connecting a Cronus cluster to the 
Internet environment and to other Cronus clusters. 


Cronus enables a variety of constituent computer systems to 
operate in an integrated manner. Cronus is distinguished from 
other distributed operating systems by one or more of the 
following characteristics: 


1. Cronus will run on a group of heterogeneous hosts. 


2. Cronus hosts will run operating systems which are largely 
unmodified. The Cronus distributed operating system 
software runs as an adjunct rather than a replacement for 
the hosts’ primary operating svstems. 


3. Hosts will be included in Cronus with varying degrees of 
system integration. Some support limited subsets of the 
services defined by the Cronus environment. 
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4. The interconnection network is designed on a hierarchical 
model. <A Cronus cluster includes a set of hosts 
connected by a high-speed, low-latency local network. A 
set of Cronus clusters may be connected over slower 
long-haul! networks. 


The Cronus architecture provides a flexible environment for 
connecting hosts so that facilities available on one host may be 
conveniently used from other hosts. It provides two alternative 
host integration schemes. A host may implement the Cronus 
Interprocess Communication (IPC) mechanism and have efficient 
communication and operations with the rest of the Cronus hosts. 
or it may access the other Cronus hosts through a front end 
access machine. which 1s a simpler, less expensive option for 
connection of a host, but which may be more limited from a4 
flexibility and performance viewpoint. 


2.4.2 Host Classes 


Cronus hosts can be divided into four groups. mainframe 
hosts. Generic Computing Elements (GCEs), workstations, and 
internet gateways. 


The collection of mainframe hosts, each of which serves a 
number of users simultaneously. includes a variety of machines 
with unrelated architecture. A mainframe host may be tightly 
integrated into the system, both offering and using Cronus 


ara 


services and fully implementing Cronus interprocess ne 
communication. Alternatively, they may be loosely integrated, Atala, 
offering no services, possibly connecting into Cronus through an ; e 
access machine which provides communication with the rest of needy ne 
Cronus. wale 


GCEs are small. dedicated-function microprocessor based 
computers of a single architecture but varying configuration. 
Each GCE provides a basic service. For example, a GCE can be a 
file manager. a terminal manager, an access machine or it might 
carry out a more complex system function as an authorization 
manager. Since al] GCEs have the same architecture, they provide 
a replicated resource which, with the appropriate software, 
enhances the reliability of basic Cronus functions. 
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Workstations are powerful, dedicated computers which provide AEM 
substantial computing power and graphics capability at the gicyery 

4 disposal of a single user. They differ from mainframes in that On 

a they support a single user. They differ from terminals in that Pew 

4 they offer significant computational resources. SS 

‘ ales 

‘ An internet gateway is a computer used to interface alt... 

communication between multiple networks. The Cronus gateway 

x) integrates the Cronus cluster into the collection of networks 

3 known as the ARPA internet and provides a base for supporting atta 

4 remote access and intercluster communication. nite 

‘ mnt 

i ate, 

’ e 

A ‘ ray’: 

Re : Basia 

" 2.4.3 System Access SONY 

‘ wy 

i There are a variety of user access paths to Cronus. One is wy 

a connection by means of a Cronus terminal concentrator. Users =» 

may gain access through the internet gateway from remote points. tate 

r? Cronus also supports access through terminal access mechanisms on Hatt 

a its mainframe hosts. These latter two access paths provide the att 

Ki same interface to the user as the terminal concentrator. Access \e ty 

" from a workstation may be different than from a terminal, since Yoh 

the workstation defines the user interface. The user has Rest", 

7 immediate access to the workstation'’s capabilities. Say 
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2.4.4 Local Area Network S: 

ty i 

! The set of hosts 1s connected bv a local area network. The ; &, 

XV characteristics of the network are crucial] to the success of pate 

‘ Cronus, since they determine the kinds of communication and ahh 

x) operations that are feasible across host components of Cronus. . 

‘ The selection of an Ethernet for the local area network for 

“ the Advanced Development Model] has been described in a recent 

: report [BBN 5086]. This choice was motivated by criteria in the 

au project's statement of work: 


1. The network should be suitable to support 4 distributed 
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operating system, 


2. The network should be currently available and economical. 
Since the Advanced Development Model will not be operated 
in a stressed environment. certain constraints applicable 
to a field-deployable version were considerably relaxed. 


The Ethernet was chosen for the local area network substrate 
for the following reasons. 


° The network must be "high-speed". For the ADM, the 
network should operate at rates of Megabits per second 
{(MBits) with low latency. with higher speeds desirable. 
The Ethernet operates at 10 MBits. 


° Network interfaces to al) or most of the computer systems 
in the DOS ADM should be available. With the exception 
of the C70. whose Ethernet interface has been constructed 
under the present contract, this was the case. 


° The local network must provide 4 datagram-style service. 


The Ethernet fulfills all three requirements and we believe is, 
at the present time, the most cost-effective network technology 
which does. In addition, the Ethernet provides broadcast and 
multicast capabilities which, have been extensively exploited in 
the system design. 


The raw Ethernet layer wil] not be used directly. To 
achieve convenient substitutability of alternate communication 
substrates, Cronus will use an abstraction of the Ethernet’ 
capabilities which is provided by a Virtua] Loca] Net (VLN) 
software layer, described in Section 14.2. The VLN represents an 
enhancement of the DOD standard IP protocol to provide for 
features common to loca! area communication. We anticipate that 
future versions of Cronus will] need to be built upon 4 different 
local network, such as the Flexible Interconnect, which have 
reliability, communication security, and ruggedization not 
available in current commercial products. By designing the VLN 
layer and building Cronus upon it, it should be easy to 
substitute any local network that provides the basic transport 
services required by Cronus. 
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2.4.5 Types of Hosts 


GCEs are implemented in the ADM system by Multibus computers 
with Sun processor board (the current vendor, one of several, 1s 
Forward Technology) processors. large main memories, an Ethernet 
controller, and additional hardware (disks, RS-232 ports, etc) 
needed to support specific functions(2). The Multibus computers 
were chosen because 


1 They are relatively inexpensive. permitting low cost 
incremental system growth. 


tw 


The Multibus standard guarantees the ability to package 
t1ndividual GCEs in different ways with components from a 
variety of vendors. 


3. New processors and devices are expected to evolve for the 
Multibus over time. 


Utility hosts provide the program development and 
application execution environments for Cronus. In the ADM. this 
function will be supported by CTO UNIX systems, VAX-UNIX Systems 
and a VAX-VMS System. UNIX was chosen due to the rich set of 
development tools already available for it and the ease of 
developing new tools and applications. The C70 was chosen 
because it was one of the least expensive computers which 
supports &@ multi-user UNIX, and because of the in-house expertise 
and support for the hardware base. The UNIX support will be 
gradually shifting to VAX-UNIX. A VAX running the VMS operating 
system was chosen to demonstrate the handling of heterogeneous 
systems. 


(2). One of the functions we would normally install on a GCE is 
the Cronus Internet Gateway, which wil] be installed on an DEC 
LS1-11 computer instead, because the standard Internet Gateway 
implementation uses the LS]-11. 
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2.4.6 Cronus Clusters and the Internet 


The goal of the Cronus project is development of a local satiate 
area network-based distributed operating system. The Cronus mh 
cluster will operate in the Internet environment as a class B a 
network. Cronus hosts will support the DoD Internet Protocol ot 
(IP) for datagram traffic, and, where connections are required, oe 


the DoD Transmission Contro!] Protocol] (TCP). Ou 


os 
KAKA 


A Cronus cluster 1s expected to use the Internet environment ahaa 
in two ways. First, access wil] be provided to Cronus from meant 
points in the Internet external to the cluster. Second, the metal! 
Internet will support communication between distinct Cronus ath 


any 


clusters. 


2.4.7 The Advanced Development Model 


The Advanced Development Mode] (ADM) of Cronus 1s the first 
instantiation of the Cronus hardware and software. I]t 1s, as its 
Mame suggests. the development testbed for Cronus. The ADM is 
experimental and can be expected to undergo rapid change as 
Cronus 1s developed, software 1s implemented, altered, and 


improved. 
tates 
The ADM 1s being assembled using many off-the-shelf wana’ 
commercial hardware and software component building blocks. This tr etatlat 


reduces the cost of 1ts components. permits the use of newly 
available state-of-the-art hardware, and enables us to be more 
flexible in its design. We are developing a design with the 
sufficient flexibility to permit later substitution of more 
Suitable hardware and software for deployable configurations. 
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3 System Overview 


A distributed operating system manages the resources of a 
collection of connected computers and defines functions and 
interfaces available to application programs on system hosts. 
Cronus provides functions and interfaces similar to those found 
in any modern, interactive operating system (see the Cronus 
Functional Definition and System Concept Report [BBN 5041]). 
Cronus functions, however, are not limited in scope to a single 
host. Both the invocation of a function and its effects may 
cross host boundaries. The distributed functions which Cronus 
supports are. 


generalized object management 
global name management 
authentication and access contro] 
process and user session management 
interprocess communication 

& distributed file svstem 
input,output processing 

System access 

user interface 

system monitoring and control. 


oo ocoodceo0863 


In this section, we introduce the Cronus design and briefly 
discuss the major elements of the system decomposition. 


3.1 System Concept 


The primary design goal for Cronus is to provide a 
uniformity and coherence to its system functions throughout the 
cluster. Host-independent, uniform access to data and services 
forms the cornerstone for resource sharing. The design of Cronus 
1s based on an abstract object model. In this model, we treat 
the system as a collection of objects organized in classes. 
files, processes, directories, and so forth. Only a limited 
number of well-defined operations can be invoked on an object, 
and the only information that a client can have about the 
structure or content of the object 1s obtained through these 
operations. The system structure is defined by the objects which 
consitute the system. the operations on these objects. and the 
responses which the objects give to the operations. The 
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underlying structure of the system. which 15 essentially hidden 
from the clients. consists of the primitives which deliver the 
operations to active objects (processes), or to processes which 
are responsible for passive objects like files. 


The Cronus distributed operating system is built from a 
number of concurrently existing objects called processes that 
reside on hosts which are part of the cluster. Some of them, 
called object managers, play & special role in implementing other 
| objects of the system. Other processes provide services and 

functions for the clients of the system. Still other processes 
run user programs. Processes communicate with each othe. to form 
larger abstractions and build more complex objects. At the most 
fundamental level. communication between processes 1s through 
Messages sent over a local area network connecting the hosts of 
the cluster. 


There are four interrelated parts to the Cronus system 
mode]. 


fe) A kerne) which supports the basic elements of the object 
model. processes, communication between objects. object 
addressing, and the relationship between objects and 
their manager processes. This part of the system 
includes facilities for locating an object and 
controlling access to it. 


° A set of basic object types. along with the object 
managers which implement them. There are two groups of 
basic object types. One group is fundamental to the 
development of new object managers in Cronus. This group 
of object types includes processes, user records and 
symbolic name directories. Another group of basic 
objects 1s provided to support various application 
domains and processing requirements. Initially for 
Cronus this includes files and 1/0 devices. 


° A paradigm for building and accessing new types of 


objects, which spells out the methods for integrating new RE 


object managers. ROR 
atinatie aN 

£ 

° User anterfaces and related utility programs to provide uty 
convenient access for both people and programs to the aeeantts 


system objects and services. 
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3.2 The Cronus Object Model 
The object mode! provides a coherent and uniform framework 
for the system components of Cronus, and potentially for 
application programs in a Cronus cluster. Since a distributed 
operating system is itself a distributed application, the 


methodology used in its construction should apply equally well to 


the construction of other distributed applications. The 
references [Xerox 1981, Rentsch 1982] discuss the object-oriented 
mode] of programming. The following are the kev features of the 


object-oriented mode] that Cronus supports: 


° Each Cronus object 1S @ member of a well-defined class, 
which 18 called the type of the object. The names of 
Cronus types begin with the string ‘CT_'. a list of some 
P of the more important types may be found in Table 1. 


° There 1s a set of operations (often called methods in the 
literature) defined for each Cronus type. These define 
the only ways that an object can be examined or modified. 


° Every Cronus object has a unique identifier (UZD) name. 
References to the object are generally through its UID, 
which 18 @ bitstring uniquely identifying the object over 
the entire Cronus cluster Cronus also has a symbolic 
catalog for cataloging UID's to provide convenient 
reference to objects. 


° The primitive Jnvoke causes a named operation to be 
performed on a named object. 


oO There 1s a basic set of operations (called generic 
operations) which are defined for al] objects; these 
operations promote a unity among the various object types 
of the system and constitutes e« limited form of 
inheritance of the operations defined for the basic type 
CT_Object. These operations include those which create 
and remove objects, and those which control access. Each 
Cronus type then has its own operations, and may redefine 
operations which are known to its parent class. 


° An object has one or more parts that are visible to the 


outside world. These may include data, an object 
descriptor, and an active (or process) component. A}] 
-14- 
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Cronus objects have at least an object descriptor, which 
is the repository for such information as access rights. 


Object Name See Section 
CT_Object 4.2 
CT_Host 5.1.4 
CT_Cronus_Process 52:1 .2 
CT_Primal_Process 513 
CT_Program_Carrier 8.2 
CT_Cronus_Catalog 9.2 
CT_Catalog_Entry 9.2.1] 
CT_Directory 9.2322 
CT_Svmbolic_Link 9.2.3 
CT_Externe)]_Link 9.2.4 
CT_Cronus_File 8.1 
CT_Prima)]_File 8.1 
CT_Migratory_File 8.1 
CT_Dispersed_File 8.1 
CT_Executable_File 8.1 
CT_Principal 7.5.2 
CT_Group C6923 
CT_Authentication_Data 7.5.1 
CT_Session_Data 11 
CT_Line_Printer 10 


Table 3.1 Cronus Objects 


Fundamentally. the implementation of the Cronus system kerne] 
consists of the implementation of the primitive Invoke. Each 
object 1s associated with an object manager. which knows all the 
internal details of the construction and jJocation of the object. 
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When an operation 1s invoked on an object, the Cronus kernel] is ala! 
responsible for delivering the operation to the appropriate ana 
objyect manager, which performs the task requested in the 

operation, and, if appropriate, responds to the invoker. sSeaten!, 


The operation switch in the Cronus kernel] supports both ae 
invocations of operations on objects and message communication Poa! 
between processes. Since processes are system objects with mine 4 
defined operations to send and receive messages, the operation e 
switch provides a host-independent interprocess communication ao 
(IPC) facility for both the system implementation and application i 
programs. Further details of the object mode! and the design of tee 
the operation switch are described in Section 4. 


Some of the attractiveness of a distributed architecture 1s & 
the potential to utilize redundancy and configuration flexibility (2.dufs 
Interest in the hardware architecture. Cronus supports 4 unified yitited 
approach to these attributes through its object orientation. In 7 
general. three somewhat different classes of objects will] be Minis 
accessed in Cronus. These are. feat! 


H 1. Primal Objects ie 


These are forever bound to the host that created them. ate 
: There is no simpler form of Cronus object. An example 
would be a Primal File, which 1s permanently bound to its OK 
storage site. 


t 
2. Migratory Objects A 


These are objects that may move from host to host as i 
Situations and configurations change. A standard Cronus 
mechanism can locate the current site to complete an e 
; object access. : 


a ee 
cA 


3. Structured and Replicated Objects 


‘ These are objects which have more internal structure than Xe, ad 
& single uniquely identified object. For example, a 7 
replicated file would have a number of primal files as ies 
its constituent parts. The UID would be recognized by a 
manager processes on each of the sites for the more Rane 
primitive elements. Replicated objects are a key element tie 

in Cronus system survivability. 


-16- atau 


. ci 
Hate Hy) a 


ate, hl ants 


a 
caine ‘ oat wr itgtangaly 


te 
&! ay wats 













+, 
a ‘t ai 


# ° 
‘ OF i ately Ht it, matt “a8 ‘ Ht, teat 
’ ‘i Be 


a me he heanro ere: tages Sy) 
















mare peo rau nth ' 


¢ 
aN at Meat why We reateeatege fh) ? utat 






















ately Rania! want 
OF a's se nanan OR ripened 


i 
al Rta 
ate RISO SUNN HHL Oe A BREAN fate A AS vy 













nit 
PH xr at gts re we Me 


Cronus can be extended by adding new object types to support 


new requirements or functions. Certain features are required for 
a each object type including supporting the generic operations. In 
ie addition, the object mode! and its associated system components 
aX define a number of system conventions such as, integration with 
a the monitoring and control software which may be adopted by 
vi subsystem designers, on a case-by-case basis. A subsystem 


designer can depend upon the existence of required features in 
other system components, and 1s obligated to provide them in each 





new component. On the other hand, the Cronus system design 
she minimizes the number of required features for system entities, 
fe which, in turn. reduces the buy-in costs for new hosts and object 
types 
ite Maintaining the integrity of complex objects 1s the 
ih responsibility of the managers for the type. This means that 


techniques can be tailored to the patterns of access to the 


nt object being maintained. 

a4 

7 Since the generic operations include those which manage 

tat access permissions. uniform access control 1s a basic part of the 
ifs, Cronus object model. The object managers contro] access to the 
a objects they maintain through the use of access contro] lists 

ve (ACL). The operation switch reliably stamps the UID of the 

‘ invoking process on each of its messages. so the process making 
. the request can be reliably identified. 

us The conventions for communication are based on the message 
" structure library (MSL). A message consists of key-value pairs. 
Se ‘There are also conventions that provide simple transaction 

% protocols, and other features to support flexible message 

’ handling and processing. The MSL also standardizes the 

r representation of data types, which allows the common 

y interpretation of data items across a Cronus cluster. The MSL 
" design is discussed in Section 6. 

" 

sf 

44 

44 3.3 System Objects 

Ny 

a . To provide the initial operating capability, a number of 

a basic system object types and their managers are being developed 
‘ts to support the functions outlined in the Cronus Functional 


Definition [BBN 5041]. They include. 
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fe) Process objects and process managers that support the Ke 
Cronus system and user programmable processes. They may 
be linked together across the cluster, and connected 
through interprocess communication to form 4 user Maite 


sees. 
een 

session. w 
aed 

| fe) User identity objects and a permanent user data base that a 
support authentication and access control. + a 


° Directory objects and catalog managers that implement the Pe AY 
global symbolic name space. is at 


oO File objects and file managers that provide a distributed Pett 
filing system which can be used in providing non-volatile WA 
storage for developing portable object managers, as wel] 
as for satisfving application program data storage 
requirements. 




















fe) Device ob)ects and device managers that support the 
integration of 1/0 devices into Cronus. 


Much of the Cronus design has been decomposed into the 
subprobiems of developing the Cronus distributed object model and 
of designing the components which provide these basic system 
objects The design of these components is described in detail 
In Sections 4-12 and in the Cronus User's Manual. 


3.4 Cronus Name Spaces and Catalogs 


Cronus has two system-wide name spaces for referencing 
objects. The unique identifier (UID) for an object is the basic 
name. Unique identifiers are fixed-length, numeric quantities, 
intended for use by programs but unsuitable for people to read, 
remember. and type. The unique identifier has internal structure 
which Cronus uses. but is normally invisible to applications. It 
contains the name of object's type and the name of the host that 
generated it. The host name 1s useful as a hint for locating 
certain objects which do not migrate. 


The Cronus system also includes a global symbolic name space 


oriented toward human use. Normally. the accessing agent would 
interact with the Cronus symbolic catalog manager to look up the 
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unique identifier for the object. After it obtains the UID, the 
accessing agent can then Invoke operations on the object. 


Although there 1s no single identifiable catalog supporting 
the UID name space, the notion of a catalog for UIDs is a useful 
abstraction. This catalog will be referred to as the UJD Table, 
in practice, the functions that it supports are implemented by 
object managers for different object types by means of UID-to- 
object-descriptor tables, which can be thought of as fragments of 
the UID Table When a Cronus object 18 assigned a UID an entry 
1s created in a UID table. This entry contains the information 
that the manager needs to access the object. Object managers 
support two kinds of operations. The generic operations. for 
example. those used to create or remove an object, to modify the 
access control list. and to examine the object descriptor, are 
defined for all objects. Other operations may be defined only on 
& particular type, these are often called type-dependent 
operations. 


The Cronus operation switch provides client processes with 
addressing based on the UID. so if a client process has the UID, 
1t can communicate with the object. The UID 1s @ universal name 
that can be used from any one of the hosts in the cluster to 
refer to the object, no matter where 1n the cluster it 1s stored. 
Although it may not happen often in practice, objects may move 
(or be moved) from one host to another. When an object is 
relocated in this fashion, its UID does not change. A replicated 
object also has a single, unique identifier for client access to 
any of its images. KReplicated objects may be developed out of 
More primitive. non-replicated objects which are usually accessed 
directly only by the replicated object manager. 


A Cronus unique identifier actually consists of a pair 
<UNO, Type> 


where UNO 1s an 80-bit unique number, and Jype 1s a 16-bit value 
naming the type of the object. The UNO portion of the UID is 
uniquely associated with a particular object. Each Cronus 
service 1S assigned a type. In the current design, all types are 
statically well-known. Since the type field can encode as many 
as 65,536 distinct types, there 1s room for expansion to dynamic 
types at a later time. 


-j9- 


w OO 


ROR OG SCONCE TGR Eh TAS NEN MEN 
gheglatt tye tag guty Pata t Peruse Oee 
BREA SERRA RMN eI eR a 





ee ee ee RP IC at Me Ue he Me PAA RENTS me 


as 


Each Cronus type has a generic name associated with it, this 
is a UID that has the type portion set to the type of the object 
and UNO portion set to zero. Cronus generic names are used for a 
variety of purposes. They act as class names in many of the 
places one would expect, particularly when an object is being 
created. That 1s, the creation of an instance of a class is 


treated as an operation on the generic name. In addition, the 
generic name is used when the system 1s interrogating the 
operation switch to find a manager for the type. In the current 


implementation, the manager itself 1s implemented from a Cronus 
primal process, which has @ UID of type CT_Primal_Process that 
was selected when the process was created. The operation switch 
is responsible for identifving the process that manages objects 
of a particular type. It does this by examining the type portion 
of the UID name on which the operation has been invoked. 


The facility that generates unique numbers may be regarded 
as existing continuously throughout the life of a Cronus 
configuration. and 1s accessible to system and application 
processes. No two requests by client processes for a UNO ever 
obtain the same UNO. Hence the unique number generator 1S an 
example of a survivable distributed program. The generator must 
be survivable. because UIDs must be unique over the lifetime of 
the cluster, and 1t must be distributed, because without it new 
objects cannot be created, so 1t cannot depend on any single host 


being up. 

a 

The UNO consists of three fields. a HostNumber, a 

Hostincarnation and & SequenceNumber. The HostNumber is the 
Internet address of the host that generated the UNO. The 
SequenceNumber 1s incremented for each request. The 
HostIncarnation 1s incremented if the SequenceNumber overflows 
its field. It 1s also incremented whenever a host that has been Soe 
down comes up. In order to assure the uniqueness of the UNOs RY a 


which are generated. the HostIncarnation is kept in stable 
storage, either on the host itself or on some other host that 
supports stable storage. 


The UNO size. 80 bits, was derived from assumptions about 
the number of UNOs that could be generated over the lifetime of 
the Cronus implementataion and the mean rate at which systems 
enter or and leave a cluster. The current field sizes will allow 
a mean generation rate of about 10,000 UNOs per host per second 
and @ mean crash rate of once every 3 minutes for 100 years, 
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these numbers are assumed to be adequate for reasonable system 
activities. 


The principal design consideration for the symbolic name 
space is to make it easy for people to use. Names for Cronus 
objects are uniform and host independent. Symbolic names are 
supported by a catalog that provides a mapping between symbolic 
names and the UIDs. This name space 1s a tree, composed of nodes 
and directed labeled arcs. There is a node called the root. 

Each node has exactly one arc pointing to it, and can be reached 
by traversing exactly one path of arcs from the root node. Nodes 
in the tree generally represent Cronus objects which have 
symbolic names. A complete symbolic name begins with the 
Fanctuation mark colon (.), followed by the names of the arcs, 
separated by colons For example. :a:b:c 1s the symbolic name of 
an object 


Not all Cronus objects have symbolic names, and those that 
do mav have more than one. When an object 15 given a symbolic 
name. an entry 1s made in the Cronus Catalog. and when the name 
for an object 1s removed, its entry is removed from the Cronus 
Catalog. The Cronus Catalog supports Enter. Lookup. and Remove 
operations. In addition, operations are provided to read and to 
modify the contents of catalog entries. 


The catalog is distributed. different hosts manage different 
parts of the name space. The implementation 1s logically 
integrated, however, so that any catalog manager process can be 
asked to perform any of the catalog operations. The upper 
portion of the hierarchy 1s replicated to support efficient 
access to different parts of the name space. The symbolic’ 
catalog is implemented out of more primitive directory objects, 
which adhere to the general Cronus object paradigm. The Cronus 
catalog is described in detail in Section 9. 


3.5 The Cronus File System 


The collection of all Cronus files constitutes the Cronus 
distributed file system. Within this file system, Cronus 
supports several file types. The most basic file is a primal 
file, which 1s stored entirely within a single host and 1s bound 
to that host throughout its lifetime. Other types of Cronus 


OAs rt 
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files are built from primal files. A replicated (or multi—copy) 
frle. which has multiple instances replicated across Cronus hosts 
for increased availability or enhanced responsiveness, is 
constructed from several primal files. Therefore, 1f a host 
contributes storage resources to Cronus. it must support prima} 
files. 


There 1s no single table that lists all file objects. 
Rather, each file manager owns al! of the data for the file 
objectS it manages. The Cronus object addressing facilities make 
possible a client interface in which knowledge of a UID is 
sufficient to access the file regardless of its location. 

Clients may make file placement decisions themselves if they 
wish. Alternatively, placement decisions will be made 
automatically. 


Ordinary read and write operations may be performed on file 
objects. The expected mode of access to Cronus files 3s to 
transfer the file data as needed. much J]1ike conventional 
filesystem access to disk files. Copies of Cronus files are made 
only to satisfv explicit user requests or to support other system 
requirements The design for the Cronus File System can be found 
In Section &. 


3.6 Cronus Process Management 


There 1s more than one type of process object in Cronus. 
Primal processes are the simplest process entities. They are 
constructed from the process abstraction that exists in the 
constituent host operating system. This simple form of process 
is used as a building block for the system implementation, 
minimizing integration costs for new Cronus host types. Since 
primal processes cannot be loaded dynamically with user programs 
and lack flexible process control functions. they are too 
inflexible to be used as vehicles for general] application 
programming, but are used as obiest managers and in other well- 
defined system roles. 


To satisfy the requirements of application programs, primal 
processes are augmented with a subtype, the program carrier 
process, which supports a richer process environment. Program 
carrier processes can be loaded remotely and started in a manner 
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that 1s uniform across the cluster. In addition, program 
carriers support. in @ host-independent manner, the kind of 
flexible contro! and interconnection of related processes found rN 
in modern operating systems. ty YM 
ie Ne 
Cronus processes have most of the features natural to the uC 

host on which they are built, and no attempt is made to hide sudan 
these features. An application builder has the choice of when to ry 

use locally-supported features and when to use standardized 1 PN 
Cronus features. To the extent that applications choose to adopt uaa 
Cronus process features, they will be better integrated with the cu 
other cluster processing activities. On the other hand. the POY 
yudicious use of local features wil] enhance the efficiency of rt wit 


‘ 
= atatatst 
the activity. Cronus processes are described in Section 5. os 


3.7 Device Integration 


Special purpose devices, such as line printers and tape 
drive devices are important elements in a system configuration. 
As Cronus objects. these devices are available to the entire 
cluster through an object manager. In some cases. more elaborate 
interfaces can provide an access path with specialized features. 
For example, a line printer service, can be provided that 
supports spooling. Device integration is discussed in Section 
10. 


3.8 User Identities and Access Control 


Users are represented by system objects, known as 
Principals. A principal object contains data that describes the 
manner in which the user mav use the system. This information 
supports operations such as authentication and session 
initialization. The object manager for the principal objects and 
for other access-related objects is called the Authentication 
Manager. The Authentication Manager component services the 
entire cluster. 


The purpose of Cronus access control is to prevent 
unauthorized access to Cronus objects. This is done uniformly by 
associating an access contro] list (ACL) with each object. 
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Access 1s then either granted or denied based on the identity of neitie 
the principal associated with the accessing agent and the ae Sa teste 
contents of the access contro! list for the object. ane! 
J 
4)! 
The operations of the Authorization Manager and the access a 1 
control svstem are discussed in Section 7. Bi 
tas iy: 
wee 
3.9 Process Support Library pain 
ee 
; yityee 
The Process Support Library (PSL) 1s a collection of mate, 
functions. that may be bound into the load image of a Cronus tate 
process sur 


PSL routines are considered part of the Cronus svstem and 
are generally supplied with the system and maintained by system 
programmers. The PSL fills the following major roles. 


1 It provides @ convenient interface to Cronus operations. 


ny 


It provides access to special! Cronus features such as the 
facilities which generate UNOs and structure messages. 
and to the elementary file system that underlies the 
primal file sytem; It also provides a uniform interface 
to the interprocess communication facility. These 


features are not normally accessed though the Operation 
Switch. 


3. It provides COS interface and utility routines necessary 
to support the production of portable programs. This 
includes format conversion routines and machine-dependent 
constants, for example. 


3.10 Important Subsystems 


Subsystems are components which use system-provided features 
to support user services. Two important subsystems in the 
initial implementation of the Cronus systems are the user 
interface and the monitoring and control] subsystem. 
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The user interface consists of several components, including 
the session manager, command interpreter and terminal manager. 
The user may gain access to the system from dedicated terminal 
access concentrators, from one of the shared hosts, or over the 
internet. The interactive processes which are controlled by the 
' user interface wil] be distributed across the cluster as required 
either by the application itself or under the direction of the 

user. A discussion of the user interface may be found in Section 
11. In addition, examples of user interaction are shown in 
Appendix A (Scenarios of Operation). 


The monitoring and control! subsystem (MCS) makes it possible 
for an operator to monitor and contro] the entire cluster 
configuration from a single console. The functions of the MCS 
include starting or restarting parts of the Cronus configuration, 
monitoring its facilities and components. and collecting error 
reports and statistics. The MCS monitors object managers and 
collects statistics based on a functional decomposition across 
the Cronus configuration rather than a site-based decomposition. 
The monitoring and control design is described in Section 12. 


3.11 The Layering of Protocols in Cronus 


The underlying support for the Cronus cluster architecture 
21s @ high-speed local area network. The Ethernet standard has 
been selected for an inter-host transport medium within the 
initial Cronus configuration. The Cronus design does not, 
however. depend directly on this. so later versions may use 4a 
different local network. Furthermore, the design does use the 
DoD standard protocols at higher levels, and requires an 
interface between them and the physical local network. 


To accomplish these objectives, we have developed a Virtual 
Local Network based on DoD Internet Protoco! (IP) conventions and 
a representative set of local area network capabilities. The 
Virtual Local network 1s an interhost message transport medium 
which 1s independent of the physical local network. 


The Virtual Local Network layer is described in Appendix C. 
It provides a primitive datagram service, compatibility with 
Internet addressing. and independence from the details of the 
physical local network. VLN datagrams can be specifically 
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addressed, broadcast. or multicast. The VLN guarantees that 
datagrams are delivered in order (sequenced) when they are 
delivered at all], and that a datagram is received once or not at 
all by each intended recipient (non-duplication). 
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4 Object Management Ate 
4.1 Introduction PAA 
meet 
In this section, we outline the Cronus object model and show Manette 
how 1t 1s used to structure the kernel of the system. This ROSS 
discussion consists of the following elements: xT 
° A short discussion of the object model in general, and of Tare 
nia 
its relationship to Cronus objects. wn 
Con My 
fe) A genera) description of the basic objects that are ca 
included in the first implementations of Cronus. a 
fe) The system primitives that Cronus uses to cause Paes 
operations to take place on objects. Nat Y 
| eeanet 
fe) The role of special processes, called object managers, in () 0 
the implemention of objects. Na atitalt 
@ 
co) The mechanization of the Cronus primitives, and the role sicatrat 
of the operetion switch in this mechanization. aaa 
wae 
° The definition of generj)c operations that are defined for th mi 
all Cronus objects. mea 
o The structure of object managers. WegaG 
wean 
ty ately 
te 


In the course of this section. it will be necessary to refer to 
the characteristics of Cronus processes. and to the methods of 
communicating between such processes. Those elements of process 
management and interprocess communication which are needed for 
the understanding of the Cronus object model] and for the 
construction of object managers wil) be sketched in this section, 
while the details have been placed in Sections 5 and 6. 
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4.2 Genera] Object Model} 


There 1S a considerable and growing literature concerning 
object models and object-oriented programming, and it is not our 
purpose to describe these methods in detail. On the other hand. 
the conceptual framework and terminology of object-oriented 
programming and system decomposition has not fully stablized, and 
any system. like Cronus, that claims to use this methodology is 
actually selecting from a range of ideas and applying them to a 
specific situation: in this case, to the design and 
implementation of a distributed operating system. 


The basic idea of object-oriented systems is that all 

interactions can, at some level. be described 1n terms of a set 

i of defined operations on objects. These methods are strongly 
associated with the development of the Smalltalk-80 system WY 
[Goldberg 1983}. but are also an outgrowth of work in the ea 
manipulation of data abstractions [Liskov 1977}, [Robinson 1977}, ‘ 
and recent developments in programming languages. There are 
useful. brief introductions to the use of these methods in [Jones 
1978], [Weinreb 1981] and [Rentch 1982]. 


fy Wy  \d 
KX any 


At first glance. one might consider it enough to think of an 
object as an instance of a data abstraction. If the internal 
structure of the data object 1s suitably hidden from the outside 
world and the proper operations provided to manipulate the 
object, we can find out everything we need to know about it and, 
equally important, nothing about how the object 18S actually put 
together. This 18 @ strong application of the hiding principle 
of software engineering, combined with a set of methods to 
examine and modify the part of the data object which is of 
interest to the outside world. 


The object mode] is this and more, however. There are 
several extensions to this basic idea which have been made in 
various systems. One of the most important 1s jnheritance, which 
we wlll discuss below. Another is the addition of objects which 
are more than instances of a data abstraction; for example, in 
Cronus we have process objects as we]! as pure data objects. 


In Cronus, all the objects which are alike in their 
structure and in the operations which thev respond to are members 
of a Cronus type (in other systems. this 1s often referred to as 
a class). Inheritance describes a relationship between types. 
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We can say that a particular type 1s a subtype S of some other on nat 
type T. In saying this, we are saying that an instance of the wnat 
type S is like an instance of type T 1n some important way. aga 
Usually this 18 described by noting that any operation which may alw ft 
be invoked on an instance of T may also be irvoked on an instance oy sa 
of S. This does not mean that exactly the same procedure will be AN 
applied to exactly the same kind of entity. For example, all 


ena 
Cronus objects inherit the properties of the basic Cronus object s 
type CT_Object. There are a set of operations defined on this FRA ee 
object. including Remove. which causes the object to go away. A eek ue 
| very different procedure is used to Remove a primal file object Sa 
| (whose type 1s CT_Primal_File) than the one which removes a user tty mi 
process (whose type is CT_Program_Carrier). But there 1s some DEN 
clear intuitive feeling which we have of what Remove means 1f we 


think of primal files and user processes as objects. 


i) 

It is worth noting that the inheritance relationship 1s sana 
rather different from the relationship which one finds in eetentatsn 
composite objects. For example, the Authorization Manager Reetetet a 
supports the type CT_Group, which 1s a composite object that is e 
built out of principals (objects of type CT_Principal]. which 1s a CRIT 
representation of a svstem user) and other objects of type AY 

ast 
CT_Group. Groups are not subtypes of principals. but are Rtlatigt 
constructed from them. Some operations that can be invoked on a Wet, 
principal, such as the ones which manipulate the group expansion ASR 
list have no analogue in the definition of a group, and make no ataaalael 
sense 1f they are invoked on @ group. wags 

4 4% 

The following are the basic object types that constitute the att 
initial implementation of Cronus: watt 

KARR 
CT_Object: This is the most basic type, and the generic SWE 


operations that create and remove objects and maintain S 
the access control lists and object descriptors (see 
Section 4.4 and Cronus User's Manual object(3)) are 
defined for objects of this type. In Cronus this is an 
entirely abstract form, and there are no instances of 
objects of type CT_Object. 


CT_Host: The Cronus system is made up of a series of hosts 
which provide services for users. This object has a 
process component that creates and manages the primal 
processes that, in turn. actually perform the services 
and manage the other objects of the system. The CT_Host 
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| object 1s sometimes called the Primal Process Manager for 
the host, because that 1s its most visible function. The 
CT_Host object 1s closely allied with the operation 
switch, which 1s used to implement the invocation of 
operations on objects. 


CT_Primal_File: The initial implementation of Cronus supports 
files which are bound to a specific host. Al] ordinary 
user data is stored in objects of type CT_Primal_File. 

In addition. a number of other object types are 
constructed from primal] files. 


CT_Catalog: The Cronus catalog 1s made up 6 series of entries 
which translate symbolic names into the corresponding 
UID. 


CT_Directorv. The Cronus catalog entries are organized into 
objects of type CT_Directory. These are built from 
objects of CT_Primal]_File. but this structure is entirely 
hidden from the user by the Catalog Manager. 


CT_Principal. A principal] 1s the system's representation of a 
user or a system service which requires access to some 
other service or object manager. The access control 
system depends on identifying the objects of type 
CT_Principal which are permitted to carry out an 
activity. 


CT_Program_Carrier: A program carrier 1S a process shel] that 
is prepared to receive a user program. The basic primal 
process 1s too simple an object to be effectively used 
for applications, even though it 1s adequate for long- 
lived independent processes like object managers. 


There are a number of other object types which are associated 
with the Catalog Manager (such as CT_Symbol:c_Link) and with the 
Authorization Manager (such as CT_Group), but the system could 
function without them. 


In object-oriented programming, a client invokes operations 
on an object, often called the recejver, which 1s identified by a 
UID, ObjectUID(3). The operation itself may be represented as a 


(3). There are a few cases in Cronus where objects are 
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In Cronus the basic primitive which causes an operation to be aN 
invoked on an object is JnvokeOnHost. This causes Operation to SPN 
take place on the object named by ObjectUID on a host at a ey 

specified network address. The operation switch of the Cronus 5 BF 
kernel provides the mechanization of this primitive (see Section >. 
‘ 4. 5) ' nt, at 
' Pa yi 
; eg Hats, 
While the primitive InvokeOnHost 1s sufficient to support xiuttatte 
the system. the relatively large number of reply messages suggest mantis 
that there should be a more efficient method for answering 4 matt 

request(4). <A second message primitive, SendToProcess is ate 
provided for this purpose. When a@ message from aclient is TR 
delivered, the ProcessUID for the client is included. The wnat 
manager may then use SendToProcess to reply directly to the DTN 
client. hatin 
dat tats ts 
In a distributed system. the client does not usualsyv know CoRR 
which host has the object manager which 1s responsible for a atte te, 
particular object. Each object must be willing to say whether it ) see 
1S on @ particular host; that is, there 1s a particular Baten 
operation. called Locate that is among the operations which is Britiat, 
defined for every object in Cronus. When this operation 1s petal tt -0. 


invoked on the object ObjyectUID at some HostAddress. the object 
manager for that type will reply 1f 1t manages that object (5). 


If the client does not specify the host when invoking the 
operation, the PSL performs the required Locete operations to 
determine where to send the operation. These Locate operations 


identified by other means, for example. a specific catalog entry 
may be identified by the symbolic’ name which is being 


manipulated. The argument presented is analogous, so it 1s 
sufficient to consider the cases where the object actually has a 
UID. 


(4). If InvokeOnHost 1s all that is available, the reply must be 
passed through the manager of the process to which the reply is 
directed. 

(5). Actually, if the client wants the negative acknowlegement, 
it will] also reply 1f it doesn't vanage the object. 
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are often performed using the broadcast facilities of the VLN. Meatteyttt: 

The PSL (or the client) may cache locations of specific objects etna 

and object managers for increased efficiency. In addition, 

primal objects, which are bound to the host which creates them, A mI 

can be found quite easily. The PSL looks at the HostAddress i ent 

portion of the UID, which contains the address of the host which wateratet 
generated the UNO portion of the UID. For the current egthagt'ay! 
implementation, the UNC 1s generated on the host that creates the ath 

object, and that also currently holds the object if it still e 

exists. = ay 
natant 

Subtype relationships are not & primitive concept in the Ne a 
implementation of Cronus. There 1s no direct implementation of natn), 
inheritance: there 1s, instead, a discipline which says that the tts 

manager of each subtype must implement the inherited operations. e 

Subtype relationships are statically realized in Cronus, through Rea! 

the cooperation of the object managers and the operation switch. ees 

In addition to simple re-implementation of the inherited gata 

operations (which 1s used for the generic operations), there are mate 

several static implementation techniques that can achieve tiga 

inheritance. A manager may register several] type values with the e 

operation switch, and implement some as subtypes of the others naan 

internally. Alternatively, one manager may invoke another teat 
through the standard mechanisms. neat, 
init 
ae 

& 

4.3 Object Naming ‘ we 

pea 

The Cronus object mode] requires a mechanism for delivering verry! 
messages addressed to objects. This mechanism, outlined briefly un 
in Section 4.2 and described in detail in Section 4.5, is called we Oe, 


the operation switch. The operation switch, in turn, requires 
the client to identify the object which is being modified or 
examined. The standard identifier for an object is its UID, 
which 1s @ bit-string containing 96 bits. This bit string 
consists of two components: a unique number (UNO) that is 
different for each object which has ever existed in the cluster, 
and the Cronus type. It is useful to think of the UID as having 
four fields (see Cronus User’s Manual uid(4), uno(4)): 


HostAddress. the 32-bit Internet address of the host which 
created the object. If the object is a primal object, 
the HostAddress 1s also the actual address of the object, 
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IncarnationNumber: a field containing an integer which is . 
incremented whenever the host is loaded or reset, or when eae 
the associated SequenceNumber field overflows. be . 
5F ag } 
SequenceNumber: a simple counter field which 1s used to bet ste 
assure the uniqueness of each UNO that 1s used to name an pe oy 
object. e. 
at ae 
CronusType. the 16-bit integer specifying the Cronus type of Res 
the object RRO 
atin 
Between them. the Incarnat:onNumber and SequenceNumber fields ROR 
contain 48 bits, but the subdivision of this string may vary from re 
host to host, for the hosts in in the initial implementation, frat tlt 
each field is 24 bits long. hy 
Re 
It should be observed that the object 1s actually identified Heatley 
uniquely by the UNO portion of the UID, and that the the Cronus TONNE 
tvpe 1s added so the operation switch can find the object eee 
manager. In particular. it 18 possible to think of an object as RR 


having more than one UID, consisting of the same UNO paired with 
different types. The current system does not make any 
interesting use of this possibility. 


There are also generic (or logical) names, which consist of 
& zero UNO and a type field specifying the type of the generic 


mame. Specific names are used for objects which can be created 
and destroyed. and have private state information which is 
important to the accessor (e.g., a particular file). Generic 


names are used for special purposes. For example, the client can 
find out if there is an object manager for a particular type on ea 
host by performing an InvokeOnHost to Locate the generic name. 
Generic names are also used 1n operations, like Create, in which 
there 1s no object name available; the generic names act like 
class objects in other object oriented systems Jike Smalltalk, or 
like the generic addressing facility in NSW's MSG, which is used 
to address an instance of a service. 


The PSL provides a pair of functions which convert between a 
type mame and the generic name for that type (see Cronus User's 
Manual uidtype(2)). Generic names, like types. can be referred 
to symbolically. By convention. logica) names begin with the 





, rf rae Oe i “ACs Mt tate ean 
Nett eeg pence me Wnt ‘ ater nen sage ond ne ‘ NON tee Pease aon ea rene as ae He 5 
Bas i A) a , painted een Ae i le Sonat eh SRT Nana pase ts aie Oe: avai AN 


TR PUn MU OE, I UN UU UMN A WL) 


prefix "CL_". For example, CL_Primal_File is the generic name of 
an object of type CT_Primal_File. 


Accessing agents interact with object managers using Cronus 
Interprocess Communication. The client may initiate access by 
giving either the UID for the object or by giving its symbolic 
name The PSL provides functions which will accept either name. 
If the accessing process has the UID of the object, the PSL 
simply constructs a message that invokes an operation upon it. 
The operation switch delivers the requested operation code, the 
UID, and any other parameters to the appropriate object manager. 
The object manager consults its fragment of the U:D Table to 
access the object as necessary to perform the reques.ed 
operation. If, on the other hand. the accessing process does not 
have the UID, the PSL first consults the Cronus catalog; then, 
when it knows the associated UID. 1t composes the message and 
sends it on its way. 


This means that we allow the symbolic catalog to be by- 
passed when an object is accessed. and the accessing process 
knows the UID. This improves performance and enhances the 
flexibility of using primitive objects to build complex objects. 
since the object manager for the complex object can use the UIDs 
of its components directly. The cost of achieving these benefits 
1S primarily one of increased implementation complexity: 


1. Access control is performed in a decentralized fashion by 
all of the object managers. 


Information about objects is distributed among object 
managers and catalog managers. Care must be taken to 
ensure that the information about an object 1s 
consistent. or if it is not, that the system can operate 
properly. 


a 
‘ee we oo 


4.4 Generic Operations On Objects 


The generic operations are defined for all system objects. 
These operations fall into several groups: 


ee ee 
— 7 > ee 


Create and Remove. These bring the object into existence and 
destroy 1t. The operation Create is invoked on the 
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generic name for the object. These operations must be 
defined for all objects. 


Locate: If the object exists and 1s managed by the object 
manager which receives the message, the manager replies 
that it knows about the object. This operation must be 
defined for al] objects. 


Read_ACL and Write_ACL: These manipulate the access control 
list of the object. These operations must be defined for 
all objects which are separately access controlled. 

There are @ few objects whose access is controlled 
through another object. For example, objects of type 
CT_Catalog_Entry are controlled through the perm)ssions 
on the containing object of type CT_Directory. 


Read_Sys_Parms, Write _Sys_Parms, Read_User_Parms, 
Write_User_Parms. Every object has an associated object 
descriptor. The object descriptor contains various 
pieces of information about the object that are made 
visible to the outside through these Read operations, and 
may be modified by the Write operations. Access is 
controlled separately for the User and Sys portions of 
the object descriptor. 


Report_Status: This operation is normally performed on a 
generic name associated with an object type. For 
example, Report_Status is invoked on the generic name 
CL_Primal_File to find out how much space there is 
available on the associated file system. 


For some operations, such as Create, the exact list of parameters 
and responses will vary from object type to object type. Other 
operations, such as those which operate on the access control 
list, perform in the same way for al] object types. For details, 
see the appropriate sections of the Cronus User's manual, 
especially object(3), acl(3), the descriptions of the objects 
below and in Section 3 of the Cronus User's manual, and the 
descriptions of the PSL routines in Section 2 of the Cronus 
User's Manual. 
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4.5 Object System Implementation . nat 
akin 
In order to describe the desSign of the operation switch and 
its role in message-oriented interprocess communication, we must Pat 
briefly introduce Cronus processes (the Cronus process is went 
described in detail in Section 5). ant 
08 
hy) 
Cronus processes are constructed from constituent host rats 
processes (CHPs). The properties of a CHP are defined by the e 
machine architecture and the constituent host operating system PH 
(COS). The Cronus process is constructed from one or more CHPs, oh 
with the addition of Cronus process features. The simplest type Oy 
of Cronus process 1s the primal process (PP). A primal process at 
1s @ CHP which can invoke operations on objects through the ta! 
Cronus Interprocess Communication facility and can be controlled e 
by the Primal] Process Manager. In addition, @ primal process can rile 
use the Cronus primitive Receive to receive messages sent through aa 
the Cronus IPC by e1ther InvokeOnHost or SendToProcess. a 
iM) 
or 
{) 
The implementation of Receive emplovs CHP-specific hey 
synchronization facilities. described in the appendixes on the 
rs interface to the COS. to build an asynchronous Receive operation. Nets 
r,t eats & 
oat 848, 
at This section describes the framework of the object system wa 
oy implementation on Cronus hosts. Figure 4.1 illustrates the ata 
NN relevant components on a single host. The boxes in the figure ADH 
% represent abstract modules of the implementation, and do not 
Hn necessarily map one-to-one into CHPs or address spaces. Ran 
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{Primal Process| |Primal File] |Program Carrier| | Program | arate 
| Manager | | Manager | | Manager | | Carrier {|{ aH 


Operation | 
| Switch | 


Nye 
| Message | oe 
Service | ee 


Figure 4.1 Object System Components ay 


In Figure 4.1. boxes 1-4 are Cronus process objects; box 5 
is the operation switch, which accepts messages from and delivers 
messages to the Cronus processes on this host. box 6 is the IP 
protocol! demultiplexing service, and box 7 1s the Virtual Local 
Network layer. 
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The operation switch 1s table-driven. This table contains 
routing information that the operation switch uses to direct 


messages from process to process. The sender and receiver may 

both be on a single host. or the message service may be involved agate 
in a host-to-host message transfer. The operation switch does Re 
not retain information about the messages, although 1t may gather a 


statistics and transmit them to the Monitoring and Control System 
(see Section 12). 


Since the invoker can request reliable message transport, 
and ordinarily does so for InvokeOnHost applied to a specific 
host address. 4 failure of an operation invocation 3s not likely 
to be due to 4 transient communication fault, with high 
probability. either the network or the target host. or both, are 
down (see Section 6 for a detailed description of the IPC and 
these services). 


The invocation sequence for an operation 1s. 


° The Cronus Process Support Library (PSL), which 1s the 
component of the system that appears within the client 
process. formats & message which contains the name of the 
object. the operation. its parameters. and other 
information which 1s needed by the system. 


° The message. which is marked as an invocation of the 
operation, 1S handed to the local] host’s operation 
switch. If HostAddress specifies the local host, it 
processes the message itself; otherwise, it forwards the 
message to the specified host. (These functions are 
directly supported by the Cronus Interprocess 
Communication facility, which 1s described in detail in 
Section 6.) 


o The receiving operation switch examines the ObjectUID, 
determines the type of the object. and hands it to the 
object manager for that type. 1f there is one. 


° The object manager for the object type then performs the 
processing associated with the operation and its 
parameters. 


° Although it 1s not necessary for an operation to follow a 
request-reply paradigm. most do. If a reply 1s needed, 
-38- 
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the object manager prepares & message that is returned 
using the SendToProcess primitive. 


Figure 2 illustrates the transmission of an operation from 
the invoking process, through the local operation switch, to the 
remote operation switch, and finally to the receiving process. 
This section describes the calls and the representation of data 
structures at the interfaces 1, 2, and 3. 


| Invoking |--->| Local | } Remote |--->| Receiving | 
| Process | {| OS | | os | | Process | 


Figure 4.2 Operation Switch Interfaces 


When the client performs an InvokeOnHost primitive on the 
Cronus object. & message 1S generated that 1s ultimately directed 
to @ manager process and accepted bv a Receive in that process. 
Information crosses interfaces (1) and (3) by means of Cronus 
system calls, which are representations of the primitive 
functions, made by the invoking and receiving processes, these 
calls may be represented as. 


InvokeOnHost (TargetAddress ,ObjectUID, Operation) 


Receive (SourceAddress ,SenderUID,ObjectUID, Operation) 


2101-80 
on 
where the function parameter Operation includes both the intended , 
operation and its parameters. (6). 
TUS ettagitagt? 
(6). The calling sequences for these functions have been nities 
modified for purposes of presentation clarity, see the Cronus manna 
User's Manual send(2) and receive(2) for a description of the iy 
actual calling sequence. ue fy 
-3 g- 
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Interface (2) is peer-to-peer communication between 
operation switches, which 1s discussed in greater detalii in 
Section 6. Messages exchanged between operation switches are 
octet sequences. The Operation parameter of the InvokeOnHost 
call is not interpreted by the operation switch, and is treated 
simply as data to be moved. The message has several header 
fields that are visible to both operation switches; these include 
the UID of the object being operated upon (ObjectUID) and of the 
client (ProcessUID). 


When the InvokeOnHost message arrives at the target host, 
the operation switch tries to map the type to @ manager process 
on the host. The table of possible destinations consists of a 
list of generic UIDs for ordinary managers and specific UIDs for 
objects which are managed separately (7). The operation switch 
first checks the ObjectUID against the list of specific UIDs, 
then the Type field against the list of generic UIDs. If the 
mapping 1s not successful. the invocation 1s discarded, but wil] 
generate an exception reply. If the mapping is successful. the 
message is transmitted to the manager process. The manager 
obtains the information by initiating an ordinary Receive 
request, when the Receive completes. the SourceAddress, 
InvokerUID, Objec‘UID and Operation have been made available to 
the manager process. 


Although one can reply by invoking the Send operation on the 
object ProcessUID. replies are usually sent by means of the 
alternative SendToProcess primitive. This primitive hands 
Messages addressed to a specific process across interface (1). 
The operation switch then marks the message which it ships across 
interface (2) as a SendToProcess message. The receiving 
operation switch then places the message on the queue for the 
target process, bypassing its object manager. The mechanism for 
delivery, Receive. 1s independent of the transmission mode of the 
original message. 


(7). Currently, the only example of such a separately managed 
object is the virtual terminal in the user interface (see Section 
11). 
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4.6 Object Manager Structure 


Object managers are asynchronous independent processes. 
They are asynchronous because they interleave the processing of 
messages. An object manager often invokes operations on other 
objects to satisfy the requests it receives; it does not wait for 
the reply to such a request, but moves on to the next request or 
reply from a previous operation. They are independent processes 
because they are daemon processes which are started by the system 
(or 1ts monitoring and contro] section) or by another daemon 
process. Thev receive messages. originate requests to satisfy 
the client requests, and reply to the original messages. 


The asynchronous character of the object manager has a 
Significant impact on its structure. Managers receive messages 
which cause them to undertake actions. These actions may be of pi 
two types. The first type occurs entirely within the manager's onan 
own address space (or within @ single Cronus process that may 
consist of more than one COS process), and is called a loca) ny me 
action. The second type requires the manager to perform one or ROSY 
more operations. called secondary requests. on objects that it 
does not manage. It must be able to keep track of a number of 
these actions. On the other hand. the manager cannot wait for 
the response from a secondary request before it accepts its own 
next request. The processing that comprises the operation is 
divided into portions that are performed before and after the 
secondary request 1s issued. When the manager issues the 
secondary request, 1t saves components of its state that are 
needed to complete the processing when the reply arrives. 


There are a number of common elements in the construction of 
object managers. 


A manager normally consists of an initialization section and 
@ main loop which 1s driven by the arrival of requests 
through the Cronus interprocess communication facility. 
Since @ manager normally runs forever (unti] the system 
crashes), there may not be code for wrap-up. 


The manager parses incoming messages, and dispatches on the RR 
message class. which takes on the values Request. Reply, and Sa 
inProgress. en Sn eat 

et 


eur 


A new Request message causes the manager to set up a control] 
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block for the operation. 


A Reply message causes the manager to identify the control 
block associated with the message, and to continue 
processing as required by that message. 


In the case of a loca) action, the manager receiving the 
Message wil] (normally) process the request to completion and 
compose & reply to the originating process. 


lf @ secondary request 1S necessary, the situation 1s 
similar to that found at the originator. A request can be put 
into the form. 


InitialPortion 
Op(Ob)) -—> Reply 
PostProcessing 


That 1s, @ secondary request 1s basically some operation (Op) on 
an object (Ob}) which generates a Repiv. Before we invoke this 
operation. we usually have some initialization beyond composing 
the message (InitialPortion) and after we get the reply, we often 
need to do some PostProcessing. 


The procedure that invokes the operation also creates a 
control block that contains the information required for reply 
processing After it passes the invocation to the IPC mechanism, 
it returns without waiting. The manager then processes the next 
IPC message (which may be a Reply from a secondary request, or a 
new Request). if there is one available. Otherwise, it goes to 
sleep until the next message arrives (see Section 6 and 
ipemisc(2) in the Cronus User's manual for details). When a 
Reply for a secondary request arrives, the manager finds the 
control block associated with it. and performs the reply 
function. When the reply processing returns normally, the 
PostProcessing routine is invoked if the message is marked OK, 
and an alternate error-handling routine is invoked if the message 
1s marked NOT_Ok. 


The independent character of the object manager principally 
effects the way errors are handled. When a process is 
interactive, 1t makes some sense to report the error to the user. 
If an independent process detects an error condition, it may be 
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necessary to report the error to the client that issued the nia, 

request. to the monitoring and contro] station (MCS, see Section etal tat 2, 

12), or to both. In addition, Cronus managers keep statistics on RVR ER 

; the kinds of errors which have been detected, and report them to \Y mista 
i the MCS periodically. Mastiaet 
mathe : 

A manager that encounters a failure during an operation, gttettgste 
r Seta, 
particularly when there are secondary operations involved, must —— 
take steps to assure that the information which is retained oe 
across host crashes (the permanent state of the svstem) and any 4 Y 
internal status information (the temporary state of the system) eter ttt 

are correct and consistent. eatin 
niger Ry 
' 

Changes in the permanent state of the system are made by eae 

atomic transactions. If it 1s necessary to make several changes rae 
in the recorded data to perform an operation. the manager that ERKRY 
receives the operation assures the client all the changes wi] parent! 

take place or none of them will. That 1s, 1n the case of a RE) 
failure. the atomic transaction mechanism either forces the metals 

transaction to completion by carrying out the intentions which sh a. 


have been posted. or undoes those portions of the intentions list 
(see Cronus User's Manual intent(2)) already marked as performed. 


any a ; 
When a manager (or any other process, for that matter) 1s aah 
carrying out @ composite action consisting of more than one a wei 
operation on one or more objects, there are often other changes 
in temporary state which must be undone if an error is detected. oRTERE 
The process maintains a work-in-process list that contains an ee 
entry for each action that 1s not yet complete. For example, if ae 
& process has acquired locks on several] files, and discovers that se 
an additional lock which is needed cannot be acquired, the nn 
Original set must be released. The work-in-process list also 
contains entries for additional special processing that is : 7. 
required if the action does not complete (see Cronus User's Maes 
Manual wip()). Matlin! 
wie 
Rain 
aan 
xen 
\ os oe , 
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5 Process Management | ROR 
abetsa" Bsa! 
5.1 Introduction $i 
nan 
Processes are the active portion of any system. Each host a 
and constituent operating system in a Cronus cluster has at least itty 
one natural concept of the process. More generally, several yutetatacet 
different kinds of processes are present in each host, fulfilling ge 
different roles. In the absence of a distributed operating STE 
system. the processes on two hosts are unrelated to each other. nate ae 
This section describes how Cronus processes work and how they iG: 
communicate with each other. The details of how processes are mins 
constructed from constituent host processes (CHPs) are discussed ty oa 
in Appendixes D. E. and F. In the following discussion. it 15s was - 
usually safe to visualize a Cronus process as being built from a 
single CHP with the addition of an object descriptor and some ees 
specialized facilities which make Cronus work. On the other ai 
hand, the implementation might be quite different in reality. SONNE 
That 1s. a Cronus process might be made up of several CHPs, or a SONS 
CHP might include more than one Cronus process (8). ” oral 
If we wish to build @ system of cooperating processes on a teaae 
cluster of computers, and to use it as a base for a distributed ea 
operating system. we must do the following: ees 
soe mit oy 
° Define a standard method for communicating among the 
processes. Cronus treats processes as objects, and uses at sete 
the standard Cronus IPC facility and the primitives ONT 
InvokeOnHost and SendToProcess for all interprocess a 
communication. Al] procedures developed for structuring san 
and parsing messages for operations on objects, such as RARE ’, 
those described in Section 6, may be used for 
manipulating process objects as wel). sr 
° Establish mechanisms for creating and controlling si 
processes on hosts of different sorts. Again, since ata 
Cronus processes are objects, this reduces to the y nae 
definition of the operations which may validly be applied : ! 
(8). In fact, a Cronus process might even span hosts. In the 
current system design, all Cronus process are primal processes, 
that 1s. thev are bound to a single host. Later implementations 


mav relax this restriction. 
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to the process objects. 


° Provide a method for organizing the process objects to 
perform tasks. This is accomplished by defining other 
objects which reflect the required organization. The 
collection of processes on a host, for example, is 
represented by an object of type CT_Host, which will be 
described below. Another example are those processes 
that make up a user session, which are represented by an 
object of type CT_Session_Data (see Section 11). 


The following three Cronus types are discussed in this 
section: 


Co) CT_Host. the organizing object for the primal processes 
associated with a physical] host. 


fo) CT_Primal_Process: the most fundamental type of process. 
Object managers are normally constructed from processes 
of this type. 


fe) CT_Program_Carrier: a subtype of CT_Primal_Process that 
has augmented process control facilities that make it 
more suitable for implementing user processes. 


There 1s one object of type CT_Host associated with each physical 
host, and it is the object manager of the processes of type 
CT_Primal_Process on that host. It is responsible for starting 
up Cronus services, which are also object managers for the basic 
system objects; it is also responsible for gathering the 
information which the operation switch needs to route messages to 
the other object managers and to specific processes when the 
primitive SendToProcess is used. 


There are two basic Cronus process types, CT_Primal_Process 
and CT_Program_Carrier(9). The type CT_Program_Carrier is a 
subtype of CT_Primal_Process. Ordinary primal] processes lack 
essential process contro] functions and other desirable 
characteristics needed for application programming. The subtype 


(9). Future system versions will introduce additional process 


types which may be distributed in extent and have special) 
reliability properties. 
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CT_Program_Carrier provides an environment tailored to the oe 
requirements of application programs. eta 
Primal processes and program carriers never migrate; once 7 
created, the process remains on the same host unti] it is Pa ‘ 
destroyed. The HostAddress in a UID for a primal process or SUNN 
program carrier tells where the process 18, so an operation atid ‘ 
switch can tell exactly where to deliver & message addressed to te 
it. e 
Rit . 
Every host participating in the system must support an \ tt 
object of type CT_Host, which 1s also referred to as a Primal wats 
Process Manager (PPM), and primal processes. In their minimal ma 
forms, the host object and primal processes are relatively Wy 
simple. This keeps the cost of integrating a host type into a _ o 
Cronus cluster low for those minimally integrated hosts that can teh! 
obtain system services from other hosts, but do not provide patente’ 
system services. Ne", xe: 
ue ne 
A primal process which plays 4& well-defined functional role OAS 


within the system 1s called a Cronus service. Cronus services 
are object managers for system-defined object types, for example. 
& Primal File Manager or Program Carrier Manager. 


Cronus processes may make use of some or al}! of the 
functions in the Process Support Library (PSL), which provides 
high level interfaces to many system functions as well as general 
purpose utilities for interfacing to and manipulating the Cronus 
environment. Portability is a major goal for the PSL, so that it 
can be implemented readily in whole or in part on new host types. 
The PSL 1s discussed further in Section 5.4. 


5.2 Objects of type CT_Host 


The basic organizational elements of Cronus are objects of 
type CT_Host. These objects correspond to the intuitive physical 
hosts that make up the Cronus cluster. A CT_Host object consists 
of the the Primal Process Manager for the host and the basic 
tables which are used by the operation switch in routing 
operation invocations. In some sense, it is reasonable to think 
of the operation switch itself as & part of CT_Host. When a host 
joins the Cronus network, only the lowest level of network 
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ut software 1s functioning; the Monitoring and Control System (See 

Section 12) engages in @ dialog with this primitive host element, 

net and brings up the object CT_Host. The MCS is therefore the 
ae object manager for the objects of type CT_Host. ean 
rete Hy 
a Oe) 
ae The Prima] Process Manager (PPM) component of a CT_Host a 
Ne object implements operations concerning primal processes as a ey 
ia class. The tables that identify the object managers and 
aay processes that are on a particular host. and that therefore are 
iN used to implement the Cronus primitives InvokeOnHost and 
iN SendToProcess. are maintained by the Register and Delete 
‘uy! operations on the CT_Host object. 


In addition to the generic operations (see Cronus User's 
Ry Manual object(3)), the following operations are defined on 
objects of type CT_Host (see Cronus User’s Manual cr_host(3)). 


ty 
‘gs Cronus_Restart 
BN Service_List 
7 Process_List 
4% Register 
Delete 
® 
we The Cronus_Restart operation 1s used to shutdown al] 
: activity on the CT_Host object. It removes all active processes. 
x . including the process implementing the CT_Host object itself. sy 
Ny After a Cronus_Restart. the host 1s 1n a state from which 1t mav SARK 
nt be bootstrapped. weal 
ut | | Re 
af The Service_List operation is used to find out what kinds of * 
service the host is prepared to support, and which ones are in 
27 fact being supported. The names of these services, which are 
, called role designators, are used to start primal processes that 
mt perform the service (see Section 5.3). 
as The Process_List operation tells what processes are active 
ae and what roles they are playing, this is the information which 
% the operation switch has about processes active on this host. 
t Whenever a process 1s created or removed, the tables must be 
% updated. These tables contain the following entries: 
4 
) 
1 ° generic names for objects paired with the specific UID of 
the Cronus process; 
"14 
‘ 
4 
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° specific UIDs for process objects that will receive 
messages through SendToProcess: and 


° specific UIDs for those objects whose manager cannot be 
identified by reference to a generic name (see Section 
11). 


The tables also contain any COS specific information needed to 
communicate with the process. They are automatically updated for 
processes which are created by the CT_Host object itself, such as 
the object managers. Other processes are created bv other 
Managers. for example. the program carrier manager. These inform 
the CT_Host of changes thruugh the Register and Delete 

operations 


5.3 The Operations on Objects of Type CT_Prima!_Process 


Objects of type CT_Primal_Process are among the most basic 


in Cronus. The three system primitives (InvokeOnHost. ona. 

SendToProcess. and Receive) are defined for these objects In eats 

addition. the generic operations (see Section 4.4 and Cronus sat! 
p 


User s Manual objyect(3)) are defined. The particular 
characteristics of these operations, when invoked on primal 
process objects. are described in detail in the Cronus manual 
(see Cronus User's Manual p_process(3)). 


The Create operation takes @ role designator as an argument, 
and starts @ new primal process performing this role. The role 
designator may be in one of the following forms. 


1. A Cronus generic UID name for the service. 
2. A Cronus symbolic service name. These are character 


strings containing the literal characters of a logical 
name, for example "CL_Primal_File”. 












3. A host dependent role designator. These are arbitrary 
strings. which have meaning only to the PPM on a specific 
host. 


at tiattent 





Role designators of kinds (1) and (2) are paired, and are 
registered with the Cronus svstem administrator as the names of 
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standard Cronus functional units. The allowable list of’ role 
designators of these kinds for & particular host object may be 
obtained by invoking the operation Service_List on the object. 
These primal processes are automatically registered, which makes 
the logical name known to the operation switch on the host, so 
that the process can be generically addressed. 


Designators of kind (3) provide for the activation of host- 
specific programs or devices. The host dependent role designator 
Might be a COS-dependent file that 1s executed as a result of the 
Create operation. Primal processes created with a host-dependent 
role designator generally have no associated logical name, and 
cannot be generically addressed. 


The primal process will] initialize its state entirely from 
non-volatile storage (local or remote disks). 


A process may invoke any operations on itself as the target 
object. <A process may send itself messages, remove itself. or 
read or change its descriptor in the same way it performs these 
operations on other objects. 


The operations defined on primal processes provide process 
contro] functions. For example. Remove 1s invoked to “destroy” 
or “kill” the process. It erases all record of the process state 
from the svstem and frees any resources dedicated to the process. 


A process which 1s removed 1s not notified of the operation. 
and has no opportunity to terminate cleanly. Only the resources 
actuelly used to implement the process object are freed: 
resources held as a result of the computational activity of the 
process (e.g.. locks on remote files) are not freed. Some primal 
processes may possess dedicated resources, and Remove disables 
the process, without releasing these resources. 


A reply will be generated to the invoker to indicate that 
the process has been removed. After receiving the reply, the 
invoker knows that operations using the UID of the process wil} 
not succeed. 


The process descriptor 1s the object descriptor portion of 
the Cronus process. It 1s useful to think of the process 
descriptor as a list of (kev, value) pairs, in the sense of the 
MSL (See Section 6.2 and the list of standard key names in the 
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Cronus User's Manual keys(4)). Some of the values implement mee 
process contro]. For example. the pair (Key_Priority,5) would tettiiatats 
indicate the importance of a process relative to other processes - 
for competing resources. Some keys must be present 1n the list eee, 
("required keys"), while others are optional (see Cronus User's ie 
Manual p_process(3), process(4)). at 
All process objects must respond to the required keys in a Oe 
uniform way. If an object supports a standard optional] key. the hae 
process must apply it in 4 uniform, system-wide manner. anes, 
Additional, elective keys may be present. Their interpretation uit 
is not specified bv Cronus. but 1s the responsibility of the yn mye 
process and the other processes with which it interacts. nate 
vat! fl 
Currently, the required keys for Primal Processes are cae e.. 
Key_MyUID. Key_MyAGS, and Key_IPCEnabled. fe 
Wee tea t'a,e 
The value associated with Key_MyUID 1s placed in the ef aie 
descriptor when the process is created, and is never changed " me 
thereafter. It is the specific UID of the process, and has type HOW 
CT_Primal_Process (or CT_Program_Carrier. in the case of program 
carrier objects}. Aiety ei 
San 
The value of Kev_MyAGS is the access group set, used with paernattyy 
access control] lists to determine access rights to objects at pitas! 
operation invocation time. The initialization and use of access sae ’ 


contro! and authentication data 1s discussed in detail in section 


fod 


{. 


The value of kev_IPCEnabled controls communication through 
the operation switch. If the value 1s true, the process can send 
and recelve messages in the normal fashion. If it is false, the 
process may not send or receive messages, or invoke operations on 
Cronus objects. This feature can be used for managing access to 
network resources. 


Currently. the only optional key defined for a Primal 
Process 18S Key_Priority, but others may be defined later. 


The generic operations on object descriptors permit a 
process to inspect or modify the descriptor of another process. 
If several processes invoke these operations on another process 
at the same time, the effect wil] be as if the operations were 
processed sequentially. 1.e., they are atomic with respect to 
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each other. 


Since the CT_Host object 18 implemented by a Prima) Process, 
these process control operations apply to it. One of the 
operations, Remove, has a special meaning when applied to the 
CT_Host. Because it is the manager of Primal Processes, removing 
the CT_Host removes a)] Cronus processes on the host. This 
forces a shutdown of the Cronus system on the host. 


5.4 Program Carrier 


d The type CT_Program_Carrier, which is designed to support 

: user programs. 1S a subtype of CT_Primal_Process, and all of the 
characteristics of primal processes are inherited by program 
carriers. Additional operations can be invoked on program 
carrier objects, and the set of required keys tn the process 
descriptor is enlarged. The program carrier 


fo) provides & process which can be created. loaded with a 
program. started. and stopped under remote control. 


fe) provides uniform monitoring and debugging support; and 


fe) provides application developers with the ability to 
control a collection of user written (possibly 
distributed) processes. 


A Cronus host is not required to’ support the CT_Program_Carrier 
, process type; however, hosts which are not dedicated to system 
service roles usually support program carriers. 


The generic operations (see Cronus User's Manual object(3)) 
are all] defined on objects of type CT_Program_Carrier. In 
addition, the special] operation Search_Al1]_Descriptors is defined 
on the generic program carrier object. 


Create creates a new process of type CT_Program_Carrier and 
returns the UID to the invoker. The program carrier manager 
initializes the process descriptor of the new process. Several 
of the fields have default values. in particular the standard 
input. output. and error output, and the access rights will be ' 
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inherited from the invoker 1f they are set for that process. 


Once 4 process has been created, the parent (or another 
process) may alter values in its process descriptor, using the 
generic operations on the object descriptor, if it has the 
appropriate permissions. 


The Report_Status operation mav be invoked on the generic 
name CL_Program_Carrier to test for the availability of resources 
before performing the Create operation. Resources mav include 
processor type, primary memory size, and special! processor 
capabilities. such as floating point hardware. This operation is 
used as part of the scenario for selecting a site at which to run 
a program (see Appendix A.B). 


The Search_All_Descriptors operation may be invoked on the 
generic name CL_Program_Carrier to find al] program carrier 
processes on @ host with the designated key-value pairs in their 
descriptors. Two important uses of this operation are: 1) a 
search on the Kev_Session key-value pair, to locate al} process 
associated with @ user session. 2) a search on the Key_Thread 
key-value pair. to locate all processes belonging to a thread. 


Cronus supports several kinds of relationships among program 
carrier processes. All processes belonging to a session are 
related. and can be Jocated as @ group; processes are related in 
parent-child relationships: and processes are bound together by 
the data streams that connect standard input and standard output 
(and by other streams that may be explicitly opened by the 
processes). 


The knowledge that a group of processes belong to the same 
session is useful for coarse-grained error recovery (killing the 
session). Streams are used primarily to provide continuous data 
paths between processes. 


The parent-child relationship supports the flow of control 
information among processes. When a program carrier is created 
at the request of another program carrier, the list of children 
in the requesting process's descriptor is updated. and the 
requesting process's UID is entered as the parent in the new 
process's descriptor. When @ process is removed, a message is 
sent to 1ts parent. The parent can then use that information to 
notify or terminate other children that were communicating with 
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the first process. As a result, the processes form a tree; any 
subtree of this 1s called @ process group. and the program 

carrier manager supports operations on process groups as well as "y 
on processes. these operations are applied to each process in the ia 


Y 
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subtree named by the process that the operation 1s invoked upon. 
These operations reduce synchronization requirements at process 

start-up. and still provide an easy mechanism to control] al! the 
children of e& process. 


The operations defined on objects of type CT_Program_Carrier 
are described in the Cronus Manual! (see Cronus User's Manual 
prog_carr(3)). 1n addition, the operations on its supertype, 
CT_Primal_Process (see Cronus User's Manual p_process(3)) and the 
generic operations (see Cronus User's Manual object(3)) can be 
invoked on program carrier objects. The operations that are 
Specific to the program c&rrier objects are: 
























Clear_Program 
Load_Program 
Proceed 
Suspend 

Stop 
Report_State 
Change_State 
Breakpoint 
StopGroup 
SuspendGroup 
ProceedGroup 


These operations are sufficient to meet two basic 
objectives: 1) It is possible to load a binary image into a new 
Program carrier object, start it, and allow the process to 
complete or be cleanly stopped, and 2) the Suspend, Proceed, 
Report_State, Change_State, and Breakpoint operations, together 
with the Primal Process operations, will support general remote 
process control. 


The required keys for the object descriptor of a program 
carrier are described in the Cronus User’s Manual, on 


prog_carr(3) and process(4). These include: 


fe) Key_MyUID. Key_MyAGS, Key_IPCEnabled, and Key_Priority, 
all of which have the same meaning for program carriers 
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as for primal processes. elt 
° Key_State, which informs other processes of the current teste, 
state or mode of a process. The states reflect only the RON 
interactions of Cronus operations and the process object, POT ALM 
: as : e) Het! 
and do not capture finer state subdivisions which are ites 
host or local operating system dependent. Rte tatanaty: 
fe) Key_StInput. Key_StOutput, and Key_StErr identify the Hatta 
data streams that are used for standard input, output and wie 
error reporting The streams are used 1n @ manner CXS 
analogous to the standard input and standard output of eis 
the UNIX process model. See prog _carr(3) for a detailed KKH . 
discussion of the mechanism for input/output redirection. 
354 
fo) Kev_Parent. which 1s the UID of the process which Ryn 
requested the creation of this process. ae 
yrs 
oH al, 
fo) Kev_Children, which are the processes, 1f any, created e 
directliv at the request of this process. eteielay 
aft itest 
etl at 
OOK 
° Key_Thread. which 1s a UID identifying the portion of the Kena 
user session in which this process was created. A user thee icy 
sess10on may consist of one or more threads of activities Warne 
that may be running in parallel. @ 
ave Pe 
° Key_Terminal, which is the UID of the virtual terminal, pais 
if any, that 1s associated with this process. aia 
‘4 1 ie 
Since the program carrier object is designed primarily to support e 
user processes. many of the details of the use of these keys are RNa 
described in Section 11. a 







5.5 Process Support Library 


The Process Support Library (PSL) ts @ basic part of the 
Cronus implemementation. It contains a large number of functions 
which can be used to construct Cronus object managers and user 
programs. All Cronus programs are expected to use the PSL to 
perform the functions which 1t Supports. The distribution of 
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responsibilities between the PSL and the Cronus kernel is often 

not defined. and may shift from implementation to implementation. 
Any program that bypasses the standard PSL interface, and makes - 
use of private information about this division 1s no longer cs nad 
insulated from modifications of the definitions of the objects, 


object managers and the kernel, and the use of such a program may y a 

produce unexpected results in the future. Me et 
The following 1S a partial list of the kinds of functions _? 

which one may find in the PSL. Sa 


° A set of standard interface routines for all operations 
on the basic Cronus objects. There are two sets of 
interface routines: those which are designed for use with 
managers and other asynchronous programs. and which do 
not wait for the response from an operation: and those 
which are intended for use in interactive programs, which 
do wait for a reply if one 1s expected. 


fo) Functions supporting composite activities. such as 
writing data on @ file specified by a symbolic name 


oO Functions supporting the construction of Cronus object 
managers. These include routines for manipulating UIDs 
and UID tables, for managing the processing requests and 
their responses in asynchronous processes, for creating 
and modifying work-in-process and intentions lists. 


° A standard error reporting facility for both asynchronous 
and interactive processes. 


° Sublibraries for message composition, string 
manipulation, portable input/output operations. and 
device management. 


The PSL 1s described in detail in Section 2 of the Cronus User's 
Manual. 
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6 Interprocess Communcation and Messages 
6.1 Overview 


Cronus presents a set of facilities for the composition of 
messages and their transmission to provide a systematic 
communication facility among Cronus processes. There are three 
parts to this communication support: 


oO An interprocess communication (IPC) transport facility, 
based on the object model and object-oriented addressing, 
provides Cronus primitives for uniform. host-:ndependent 
communication among processes. This facility, which was 
introduced in Section 4. 1s further described in the 
current section. 


te) Conventions for passing data using Cronus canonical data 
types permit messages to be composed without concern for 
the heterogeneity within & cluster. 


° Protocols and conventions for constructing messages used 
in intercomponent interactions. especially the invocation 
of operations and the replies. 


The Message Structure Library (MSL) organizes these conventions 
and protocols by providing routines for the composition and 
examination of messages. 


The IPC mechanism of Cronus 1s built upon the primitive 
functions InvokeOnHost, SendToProcess, and Receive. These 
primitives support the asynchronous communication of 
uninterpreted data octets among Cronus processes, by means of the 


abstractions of sending to a process or jnvoking an operation on 
an object. 


Messages, the entities communicated by the IPC, may be sent 
either reliably or with minimal effort. In addition, notions of 
both a small message which can be carried by a single datagram on 
the underlying transport mechanism, and a large message which may 
require an arbitrarily large number of datagrams are supported, 
although this distinction is hidden by the IPC library routines. 
Messages may be sent and received all] at once or in pieces. The 
size of the chunk of data manipulated 1s independently selected 





by the sender and receiver. Large messages of indefinite size 
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form the basis for interprocess stream communication. 


The Message Structure Library (MSL) is used to format 
messages, but is independent of the IPC. It provides a mechanism 
for inserting and extracting typed. structured data into a 
message buffer 1n eo position- and machine-independent manner. 
Associated with the MSL are conventions, called the Object- 
Operation Protocol, for the patterns of communication that arise 
in performing operations on Cronus objects. 


The IPC and message structure facilities. and their 
relationship, will] be discussed 1n the following sections. The 
details of the interfaces and the specific amplemenation of the 
IPC will be found in the Appendixes on the COS implementation and 
in the Cronus User’s Manual. 
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6.2 Messages in the IPC 


The IPC facility supports two classes of messages: reliable 
Messages and minimal effort messages 


A message sent reliably will be delivered to the receive 
queue of the addressed process (or the manager of the 


addressed object on an InvokeOnHost) despite transient areca 


failures in the communication substrate. A reliable tatesttetty 


message will be delivered at most once. 


Minimal effort messages are transmitted with whatever 
reliability characteristics are provided by the 
communications substrate. The IPC facility does not 
attempt to provide a sending process with information 
regarding the disposition of the message. 


In both cases, the message is protected by an end-to-end 
checksum, so 1f the message 1s delivered, the content may be 
presumed to be correct. 


The sending process may use minimum effort messages whenever 
it seems appropriate. The current implementation uses them for 
all messages sent to a broadcast or multicast address. 
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Messages may also be categorized by length. A small message 
will fit anto an IPC packet throughout the cluster. The maximum 
size of a small message is implementation dependent, and in the 
current system is about 1500 bytes (see Cronus User's Manual 
message(4)). <A large message may have a length set at the time 
the message 1S initiated, or the length may be indefinite. 
Minimal effort messages are constrained to be small, while 
reliable messages may be small or large. 


A large message may be of any size. although they are 
generally larger than the smal] message limit. and the PSL 
automatically selects a small message for messages below the 
limit and @ large message for a message above the limit. 


Messages of indeterminate length support Cronus streams, 
which are uni-directional data channels between a source object 
(sender of the message) and sink object (receiver). Cronus 
streams are used to interconnect processes with devices and with 
other processes. Although data flow on the stream 1s 
unidirectionel. the implementation of a stream involves 
transmissions in both directions. from source to sink containing 
data. and from the sink to source containing flow contro! and 
synchronization information. 


One objective for the IPC facility is to make the 
distinction between small and large messages be as sma)} as 
possible. In particular, the content and structure of the 
information contained 1n 4@ message. and anv information about a4 
message that 1s delivered to @ recipient (e.g., size, source, 
etc.) is independent of its transmission characteristics. The 
sender of a message indicates whether or not the message is to be 
transmitted reliably, and its length, if it is of bounded length. 
The receiver need not be concerned with these characteristics of 
the message. 


6.3 Programming Interface 


The programming interface for the IPC provides facilities 
needed to invoke operations on objects, send messages to 
processes, and receive messages from clients. Many application 
programs wil] be written in terms of higher level routines which 
mav be found in the PSL. The interface described in this section 
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is primarily of interest to systems programmers who are ~ nh 
developing and maintaining object managers and PSL routines. ata, 
The interface provides direct support for the Cronus wiieth 
primitives (InvokeOnHost, SendToProcess, and Receive), for the meats 
full range of message types (reliable small, minimum effort RO 
small. and reliable large), and for various buffering strategies antegneatt 
that the sending or receiving process might wish to adopt. aFatetanly! 
e 
When 4 process invokes an operation on a Cronus object, it natant 
uses the PSL function Invoke; when the message is tranferred bv At 
the SendToProcess primitive, the process uses the PSL function mtg 
Send. In either case, the process indicates the size of the SN 
message being sent, whether it 1s to be sent using reliable Hevaliay te 


transmission, and points to a buffer which contains the 
information which is currently available for transmission. The 
buffer mav contain the entire message or any portion thereof (see 
Cronus User's Manual send(2)). The IPC accepts the information 
for transmission, and returns a smal) integer, called the message 
handle. If there 1s more information to be sent, a new buffer 1s 
given to the SendMore function, along with the message handle. 
Finally, the message 1S completed by applying the LastSent 
function to the message handle. 


The operation switch on each Cronus host provides buffering 
for messages and synchronization between Cronus processes. 
Buffering and synchronization are closely related, because 
buffering in an intermediary influences the synchronization 
points between processes. 


The sending functions accept the message 1f it can be queued 
somewhere within the IPC mechanism. It can be in a host- , 
dependent transport mechanism between the process and the 
operation switch (see Figure 1), on the “receive queue” of a 
Cronus process (1f 1t 1S an intrahost message), or on the 
“network queue” of messages waiting to be transmitted (if it is 
an interhost message). If the message cannot be queued 
immediately, it is refused by the IPC, and the sender is 
responsible for any required recovery. 


Even 1f the message is accepted, the IPC does not report 
that the message has been delivered or that delivery can be 
assured The only way the sender can be assured that a message 
has been received by it 1S to wait for a reply from the intended 
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recipient. Cronus managers respond with at least a Repl yCode 

whenever an operation 1s invoked on an object. User processes 
should normally observe a similar protocol, since lower level 

protocols cannot assure deliivery of messages. 


receive 
queues 

~------~--------- li | 

| | ----------- 

| | 

| network Vo ee ---- : 
Peer-to- | queue ---> | | |-.-> Receive 
peer Do ae eee 
Message <---| | | <--- 
Protocol]  — ---------- | [| ----------- 


aie Tax eae cae .-< SendToHost 
(interhost ) 


t 
poe i Ss = 
Process to Operation 
Switch Transport 


Figure 6.1 Schematic of the Operation Switch 


The receive queues are maintained in FIFO order; the network 
queue 1S @ group of FIFO queues, one per destination host or 
process. Entries on the receive queues are delivered to client 
processes to satisfy Receive requests, and entries on the network 
queue are transmitted to remote operation switches, where they 
are placed on the proper receive queues. 
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When the receiving process 1S prepared to process new data, 
it executes the Receive or ReceiveMore function. Each new 
message is started with Receive, and if the entire message is not 
available, or cannot fit into the buffer that has been given to 
Receive, more of the data can be read with ReceiveMore ‘see 
Cronus User’s Manual receive(2)). Both functions return 
immediately with the data, if any, that is available. 


The buffering strategies in the two communicating processes 
may be different. The sending process can, for example, send the 
entire message in one piece, and the receiving process may choose 
to receive 1t a chunk at a time. 


The IPC also provides functions which give the client 
contro] over the message queues. the basic timeouts which control 
error handling, and the processing of asynchronous events (see 
Cronus User’s Manual ipcemisc(2). receive(2)). These functions 
include. 


° WaitForChange suspends the process unti!] an interesting 
event occurs. Typically. this wil] be the arrival of 
another message or more data for ae message which has been 
partially received. Other interesting events include 
timeouts and events which are unrelated to the IPC 
mechanism. 


oO AbortMessage deletes a message from the queue without 
completing processing (either send or receive). 


° SetDefaultTimeout adjusts the standard timeout for the 
process. 


° MsgQueueSize tells how many messages are waiting for 
processing. including any partially received messages. 
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| 6.4 IPC Implementation CN 
The implementation of the Cronus JPC can be described at two reer 
levels. There are some elements of it which are generic; the nee 
structure of the implementation must support those facilities mente, 
which clients expect of it. These include the overal) issues of HS 
buffering. synchronization, and reliability. for example. At the at 
second level, there are specific decisions about how the initial sattaat 
implementation wil] be constructed. Future implementations of e 
Cronus may choose to do things 1n 4@ very different wav. For yesh 
example. the current implementation uses the DoD standard RN 
connection protocol, TCP, to implement reliable message Meaty 
transport. Future implementations may use a different reliable tat it 
transport mechanism. NANA 
cure 
Cronus IPC supports three types of messages. at 
Chay ely 
fo) smé]], minimum effort messages, Risse 
ashy 
ro) smal]. reliable messages. and Sti 
* 2 
o large. reliable messages. a 
hatin, 
Neither the protocols used nor the structural requirements of the we 
implementation specify the division of responsibility between the K RAN, 
operation switch and the PSL for these various classes of tes 
message. In fact, the division might be made differently in 2 
different hosts in the same cluster. The transport mechanisms . a 
used in the current implementation are shown in Table 6.1. me sat 
RO 
+h 


Small, minimal effort messages are sent from Source 
Operation Switch to Destination Operation Switch by means of IP 
datagrams using the standard User Datagram Protoco! (UDP). 


Receipt of an IP/UDP datagram bv the Destination Operation Switch 
1s not acknowledged. 


On receipt of a datagram. the Destination Operation Switch 
determines if the enclosed message should go to a local object or 


process. If so, 1t places the message on the receive queue of 
the object manager or process 
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TYPE OF MESSAGE TRANSPORT MECHANISM i 
Smell, minimal. IP - Operation Switch <-> Operation Switch 2 Fears 

effort tasate at 

e 
Smal], reliable. TCP - Operation Switch <-> Operation Switch eautde 
\) He, 

fate’ 

Large. reliable TCP - One connection per large message, Me tlattt, 

connection establishment initiated by ateatity 

an Operation Switch to Operation Switch anal, 

interaction, but connection may be in 5 e|. 

the Operation Switch or the PSL, at the SRA 

en) th 

discretion of the host implementation. mistagntet 

() 4%. 

a 

CN 

Table 6.1 Message Transport Summary seat 

practaate 

i? 

The initial implementation of Cronus will transmit small, Ch 

reliable messages from Source Operation Switch to Destination aitaieh 

Operation Switch over a TCP connection because it 1s the fastest citar 

way to get the implementation working. TCP provides services not Heratatiton’, 
required for smal] reliable messages (e.g., strong sequencing, _@ 

reassembly) and we may find that the overhead thev impose makes f Wis 

the performance of the IPC unacceptable. If this 1s the case, we aka 

will develop a reliable small message protoco] (RSMP). RSMP eS, 

would perform the following services ' A. ase 

' th Ot 


° Provide receipt acknowledgement. 


e 
SY 
fc) Provide for retransmission. Ng 
” . A 
ARN 
° Perform duplicate detection and elimination. ty 


ct fatty 


As with small minimal effort messages, upon receipt of a 
message the Destination Operation Switch will determine which 
local object manager or process should receive the message and 
will place the message on its receive queue. 
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Large messages are implemented through 4 TCP connection for 
each message. There is an interaction between the source and 
destination hosts to establish the TCP connection. When the 
message has been transferred, the TCP connection is closed. 


The following steps are used to establish a new TCP 
connection to carry a large message between two processes: 


The source host selects the port to be used for the TCP 
connection. and puts its end of the connection into the 
listening state. 


The Source Operation Switch sends 4& StartLargeMessage (see 
Cronus User's Manuel] message(4)) message over the Operation 
Switch to Operation Switch TCP connection. This message 
specifies the destination. the port for the TCP connection, 
and perhaps the first part of the message. 


The Destination Operation Switch places the message on the 
receive queue of the object manager or process. 


When the destination process executes a Receive and finds 
the first part of @ large message, any data sent along with 
it 1s delivered. The destination host selects a port for 
its end of the TCP connection, and uses the TCP port 
supplied within the StartLargeMessage message. 


After the connection 1s established. the source host w))} 
use 1t to pass message data to the destination host. 


After the source process sends the last chunk of data in the 
large message, the TCP connection will be closed. 


This discussion does not specify whether the Operation 
Switches or the client processes are responsible for managing the 
connection that carries the bulk of the message data, nor whether 
the Operation Switches or client processes are responsible for 
actually using the TCP connection to send and receive message 


data. These implementation decisions may be made differently for 
each host type. 
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6.5 Object Operation Protocol hata 
tata, 

The Object Operation Protoco!] (OOP) 1s used by the PSL 5 
whenever operations are invoked on Cronus objects. There are mie e 
three basic message types in this protocol: Request, Reply. and marie 
InProgress. All of the messages in the OOP are marked as DNAS 
belonging to the operation protoco], and each 1s marked with its Matai 
basic type. Messages arising from one Request normally contain tthe teat 
the same Cronus unique number called the operation identifier. A ® 
Request message also contains the operation name and a Reply a 
message contains a standard reply code. These are the minimal ARAN 
contents of the messages: they also contain additional, ae 
operation-specific information. y uN 
RS 

The simplest message pattern involves one Request message . 
generated by a client. and one Reply generated by an object RN 
manager in response. AKT 
ott tite 
During @ manager's handling of the request. it may send an Niall 
InProgress message to the original requestor. Any number of eh 
InProgress messages mav be generated by manager processes e 
handling @ request. they are all addressed to the process which seve 
initiated the Request message. A client may use these messages Pa 
to reset time-outs. for example. vat 
nearest 

We distinguish between a simple operation (or operation) and hetaniy 

@ compound operation. A simple operation has a single operation pes 
name and operation identifier. Anv manager process, in the Ro 
course of acting upon a Request may invoke one or more new RN 
(simple) operations by sending Request messages. A compound SR 
operation 1s the aggregate of all simple operations arising from niente 
or caused by the invocation of one simple operation. Normally, martial? 
all of the suboperations will complete before the intiating e 
simple operation completes. Each of the simple operations has RO 
its own operation identifier, so a process may invoke several NS 
sub-operations in paralle}. * aut 
Ae 

Sometimes a manager cannot complete the processing required ae ‘ 


for an operation, for example, a request for a catalog lookup may 
be satisfied only by the cooperation of catalog managers on two 
hosts. The manager may then either. 


fc) perform as much processing 1t can, and send a Reply that 
is marked Incomplete, or 
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° elect to complete it using sub-operations, which follow 
the same pattern as requests, and send a Reply when the 
operation is complete. 


lf the manager chooses the first of these alternatives, it can 
often send the text of the message that the client needs to send 
to the other manager as part of the Reply. The client can 
complete the operation by invoking another simple operation. 


COE 
NAAT 


It is desirable for a Cronus process to be able to query the 


status of a compound operation. The operation identifier of the 

original request 1s used as @ globa! identifier for each 

suboperation. Since this identifier 1s included in the Request Raitt 
Sate tat 


messages of al! simple operations it causes, the managers acting 
on suboperations can respond to a status query keyed to the 
initiating identifier. 


e 
“" C 
me ay 
a 
Y ‘i 
6.6 Message Structure TN 


The primary design goal for the Cronus message structure 15s 
the regularization of contro] traffic. Control traffic includes 
requests for operations to be performed on objects, replies 
generated by operations, exception notices, and messages needed 
to coordinate distributed object managers. Control messages are 
usually short (tens to hundreds of octets). Because performance 
is @ Major issue, messages should be compact, and efficiently 
composed and parsed. 


A message structure can be evaluated in a number of ways. A 
discussion of evaluation criteria, and an application of these 
critera to a number of wel)-—known message structures may be found 
in [BBN 5261]. As a result of that analysis, a standard Cronus 
message structure was formulated. It has the following 
characteristics: 


® 
2 we 
Sent 


fe) Messages are self-describing, so the fields may be 


identified by name rather than by order. This simplifies * aT 
the parsing of messages, at the cost of transmitting the Sauce 
identifying information. es 
* ‘ 
CoN 
° The conventions rely onlv on features that are available 4s 
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1n many programming languages. This improves the 
portability of the implementation, at the cost of 


increasing the cost of a single implementation. . at 
) endas! 
fe) The need to define new data types, which are treated in OY 
’ the same way as the pre-defined types, is explicitly 

recognized. This 1s consistent with the general 


philosophy of Cronus design. 


’ ° Name and data type fields are compactly coded, and 
efficient programming interfaces are provided, while the 
overhead of a genera] message format 1s held down. These 
al] contribute to good system performance. 


The Message Structure Library (MSL) is a collection of 
functions that 1s part of the PSL; these routines fal] into three 
classes. 


o application interface functions. 
o data translation functions. and 
o structure manipulation functions. 


The application interface procedures construct the message in an 
externa) representation. which 1s machine independent, using the 
data translation and structure manipulation functions. This data 
Structure can be transmitted from one process to another, and 
subsequently parsed by MSL procedures at the receiving process. A 
summary of the functions and a cross reference to detailed 
discussions of them may be found in Cronus User's Manual, on page 
ms1(2). 


The Cronus external representation is based on key-value 
pairs, where the key 1s a conventional name that 1s stored with 
each data value. The key indicates the meaning of the value. 

The value. in turn, consists of a data type indicator and the 
actual data. Including the type indicator assures us that we can 
move the data from one Cronus host to another. The internal 
representation of the data mav differ at the sending and 
receiving hosts, but it 1s always transmitted in @ canonical 
form. along with its type [Herlihy 1982]. 
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A canonical type 1s either an atomic or composite type. An aan 
atomic type, such as boolean or signed 16-bit integer. defines 4 it 
set of primitive daté values. A composite type. such as array, 
has substructure defined in terms of other canonical types (see 
Cronus User's Manual can_types(4)). SORA 


Keys are coded as short (16-bit) integers, but values can ay 
vary in length from one octet to many thousands. and are not Pata ttt 
restricted in form, and may be built from simple or composite oe 
data types. 


Most IPC messages passed among managers or between processes He) 
and managers use @ high-level] protocol called the Object- eta 
Operation Protocol (OOP). OOP 1s based on a4 set of wel}]-known vterytate! 
kevs which are used object managers (see Cronus User's Manual ms 
keyvs(4)) 
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7 Authentication, Access Control. and Security : tittettay 
ON 
7.1 Introduction { 
Ra 
+ Fs 
The goals of the Authentication and Access Control facility atutstaltt 
are: PONG 
pies 
yf , 
1. Prevention of unauthorized use of Cronus and unauthorized a Aga : 
access to DOS maintained data and services. e 
4° bt ta, 
RR 
2. Preservation of the integrity of the system and its RY 
components against intentional insertion of unauthorized Metatalaatis 
components. east! 
tatty 
3. Support for a uniform user view of access control to the 98 
resources and functions provided by Cronus wixt 4 
ae 
aateestat’a! 
4 Surviveble authentication functionality ~ ua 
~ sete 
The design of the access control and authentication facility SPRY 
assumes that systems in a Cronus cluster are a}! in a single : 
administrative domain. There are & three broad classes of hosts DOAN 
within the cluster. matte 
raat 
fe) hosts dedicated entirely to Cronus system functions and eet 
not user programmable. reste 
® 
ro) hosts supporting user applications using tamper-proof Siento 
multiple protection domains (trusted multi-access hosts); Pt secatanat’ 
and i , NOOK) 
tt 
' 
: 
fe) hosts supporting user applications without secure rena 
multiple protection domains (single-user workstation ® 
hosts). Pa tate 
anne . 
1) . 
eK) 
We assume al] hosts supporting dedicated Cronus functions SA 
and multiple user protection domains are physically secure from peat aN 


tampering. Workstations may not be completely physically secure, 
but have at least a tamper-proof component. At minimum, this 
component 1s in the local network address insertion and reception 
function. It could, however, be higher up in the workstation 
system. in the virtual loca) network internet address insertion 
and reception function, in the object system process~unique 
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identifier insertion and reception function, or even higher. In 
this sense. al] user-programmable hosts support multiple 
protection domains (user and system), although in the limiting 
case, the "system" domain may simply be a piece of network 
interface hardware. Since we ere not aware of any workstation 
systems meeting this requirement, we assume future product 
packaging changes. There seem to be two viable positions to take 
regarding the assumptions on these changes. 


1. Assume only an absolute minimum, that a single low level 
“address" can be protected. 


Allow the set of protected functions to grow as needed to 
conveniently interface the workstation 1n 4@ manner as 
similar as possible to multi-access systems. 


The extreme solution to the second approach could be an access 
machine for each workstation. although other solutions are also 
possible. For our current work we will assume the second 
approach. planning only for an arguably insecure implementation 
directly within the workstation. 


The network (cable) itself may also not be totally 
physically secure. While parts of it can be expected to be 
secure (e.g. within &@ secure machine room), other parts can be 
expected to be exposed to unauthorized connection. 


7.2 The Cronus Access Control Concept 
7.2.1 Decomposition of the Access Control] Problem 


The basis of access control in Cronus is the ability of 
Cronus to reliably deliver the address of a sender of a message 
(or invoker of an operation) to the receiver of the message. The 
Cronus communication subsystem is implemented so that this is 
true. That is. 


for IP and Virtual Local Network: 


If the sender is within the Cronus cluster, the 
internet host address of the sender is reliably 
delivered to the receiver. If the sender 1s not within 
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the cluster, a non-cluster internet host address 1s 
delivered to the receiver, which can be interpreted bv 
the receiver as indication that the authenticity of the 
sender's address might be suspect. 


for the Cronus IPC/object system: 


The UID of the sending or invoking process is reliably 
delivered to the recipient of the message. 


The recipient of a request can decide on the basis of the 
sender's identity whether or not to perform an operation 
requested. 


For this to be a useful basis for access control, a means 
for reliably associating some authorization with senders’ 
addresses and process UIDs 1s required. 


One approach 1s to make static bindings between 
authorizations and addresses or UIDs. These bindings would be 
“well—-known", such that when a@ process receives a request from 
the process with UID_Y it knows that the process 1s acting under 
the Z_Authority. This method 1s used in the ARPANET TELNET and 
FTP protocols, users assume that the process for sockets one and 
three are under the authority of the host administration and can 
be trusted with their passwords. Static bindings are too 
restrictive to be the sole mechanism in a system like Cronus, 
although 4 few static bindings are required for the access 
contro] mechanism to work (see Section 7.6). 


Dynamic binding 1s useful when authorities are not al] known 
at system creation time, and when processes are dynamically 
created. The system must not only support mechanisms to 
dynamically establish the binding between a process and an 
authority, but also to dynamically determine the binding from 
some system entity in @ trustworthy manner. 


Most Cronus activity is the result of requests initiated by 
users of the system. Human users are represented by an 
abstraction called a “principal”. If we extend the notion of a 
principal to include elements of the system, such as object 
managers, all activity in the system can be thought of as 
initiated by principals. System elements which are principals 
are called “system principals”. Each Cronus principal (human or 
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system entity) has a unique identifier. Different system 
principals have different authorities. For example the primal 
file manager and the printer service are Cronus system 
principals, neither of which need be authorized for all] of the 
objects and operations accessible to the other. 


Access control can be thought of as consisting of the 
following steps: 


1. Identification. Determine the identity of the principal @ 
that 1s requesting a particular operation. hed 
nai! 
i t 
Authorization. Determine whether the principal has been fant 
authorized to perform the operation. 


For example, when an object manager must decide whether to . 
perform an operation. 1t must know the identity of the principal rast 
that is requesting the operation (Identification) and the rights 
the principal may have with respect to the operation 

(Authorization). 


7.2.2 Authorization 


Cronus uses access control] lists to support authorization. 
The access control list (ACL), which is part of the object 
descriptor. "protects" a particular action. In the simplest 
case, 1t 18 @ list of the principals who have authorization to 
perform the action. When @ principal attempts an operation, the 
list 1s checked for the principal, 1f the principal is present 
the authority to perform the operation has been verified and the 
operation may occur. 


In Cronus this simple idea is extended in two ways: 


Group identifiers may appear on an ACL, so an entire 
group of principals can be authorized as a unit, or have 
its authorization revoked as a unit. 


A set of rights is associated with each identifier on an 
ACL. A single list can selectively contro! a principal's 
or @ group's access to an object for which several 


SS, 


tURAp 


e 
*, 44", %, 


a LN AL ste Ca a ae Mein eS 
Sua Se 
i v4 f 
a Patasateela, nt Oye, taal ae tall Os eg teett fi 





r) 
ON 


He cme am 


— 


wow we 


Mart ah 
: stat edt KY 


: in « ne ss 
ROSS Ro MA 2 co Fae Baa” a TN Por aT Ah OT mage ei eae 


Ee On Ge OUNCE UMIN UUEW LU Uw Uw we ie AS * VA ROA BORG BO" SURO RS MAS NS eS 


‘ 


a 
. ‘) f 
we 
i) 

, 255, &, 
ma 





operations are defined, such as a file. Rights are 
abstract, bound to specific operations by the 
implementer. 


An ACL 1s a list which contains elements of the form. 
(id. rights) 
where "id" 1s either a principal (PID) or a group identifier 
(GID), and “rights” define the principal's or group's 
authorization with respect to the object the ACL protects. The 
allowable rights for a particular ACL are dependent upon the type 
of object being protected. 


Users log into Cronus as principals by supplying an 
appropriate name and corresponding password(10). A system 
component called the Authentication Manager maintains records of 
all] principals and groups. Collectively, these records form a 
User Data Base (UDB). At login time the Authentication Manager 
expands the membership of a user-specified subset of the access 
contro] groups which he )s & member. This 1s a transitive 
closure computation on the specified list of group identifiers in 
the user's record. The user's own 1d, PID. 1s added to the 
result of the expansion. Tht resulting set of principals is 
called the access group set (AGS) for the process:(11) 


AGS = {PID{ U Show_Group_Membership_Expanded (GID) 
for the default GIDs »n the PID record. 


The AGS is used in access control checks as follows. When ~ 
an action protected by an ACL 1s attempted, the ACL is compared 
with the principal's AGS. If an entry of the form: 


(ID, (..., Right, ...)) 


where 


(10). See Append)x A for a more complete description of the 
login and session initiation scenarios. 

(11) The basic ideas associated with Access Group Sets have been 
adapted from similar work at Carnegie Mellon University in the 
Central File System project. 
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ID is in AGS. and 
Right is required to perform the action 


is found on the ACL, the principal’s authorization 1s verified 
and the action may be performed. 


During a session, a user may add and remove identities from 
the current AGS. To add a group identity, the user must be a 
member of the added group. Updating the current AGS 1s 
accomplished via operations invoked on the Authentication 
Manager. which causes the update of the current process AGS list. 
These operations affect a single process however, the new AGS 
will be inherited by subsequentiy-created children only. 


7.2.3 Identification in Cronus 
There are two related identification problems. 


11 At the start of each session. the identity of the user 
must be established. 


12 Processes must be able to ascertain the identity of the 
principal corresponding to the processes with which 
they interact. 


The solution to both problems lies 1n a set of mechanisms that 
bind processes with principal ids and group identifiers. These 
mechanisms depend upon the ability of the communication system to 
deliver the UID of a sending process to the receiver of a message 
reliably. 


It is useful to restate these problems into the following 


terms. 
1. A binding must be established between a process and an 
AGS, 
2. There must be a means for a process Pi to determine the 


binding between another process P2 and its AGS. 
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When a user approaches Cronus to start a session a process (Pi) i 
is allocated(12). Pl cannot be bound to U (the user's principal aay 
identifier) until Cronus establishes the connection vie password 
authentication. Before that happens, Pl] 1s bound to a well-known : ateit 
principal, "NotLoggedin”, which has minimal authorization. One nn , 
task of the login procedure is to change the binding of Pi from Wy i 
NotLoggedIn to U. Be ile 


The binding between a principal identity and a process 15s 


established by the Authenticate_As operation. The user engages eatin 
in an authentication dialogue with Cronus, supplying 4@ name and nn 
password which 1s checked against the UDB If the authentication itt i, 
dialogue succeeds, the AGS for U 1s computed and @ binding 1s hate ‘J 
established between Pl and U. A record of the binding pace 
e 
Pl, U, AGS a 
at 
‘a 
1S maintained by the process manager for the authenticated oats 
process, to be used throughout the process lifetime. The oor} 
identity of the user has been established. completing problem 11. parse 
®@ 
Throughout the course of U's session. PI and other processes nas ci, 
acting on behalf of U attempt actions which require authorization at y 
verification by the processes that perform the actions. This 1s mise " 
problem Ie. Consider a@ situation 1n which Pi has requested af aft, 
another process (S1) to perform some action (A). shown in Figure bas ne, 


1 


In order to perform an access control] check, S1 needs cv 
determine the binding of Pl The identity of Pl 1s known to $1 
because Pi's UJD was delivered along with the operation 
invocation that requests A. S1 can obtain the binding of Pl by 
invoking the Authorization_Binding_Of operation. 


Authorization_Binding_Of(P1) -> U. AGS. 


Authorization_Binding_Of causes a message to be sent from S1 to 


(12). Cronus ectually uses a more complex process. structure to 
support @ user session, as indicated in Appendix A.3. However, 
the following discussion is insensitive to these details. so we 


use this simple model in our explanation 
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Figure 7.1] Retrieving access Control Data 


the manager for process Pl, which returns the bindings for the 
process to $1 


The login sequence establishes a binding between user (U) 
and an "initial" process (Pl). Bindings are established for 
other processes created during a user session through 
inheritance. During a user session, processes created by an 
authenticated process inherit both the principal identity and the 
current AGS of the initiating process. Object managers attain 
their principal] identities and access group sets as part of the 
system initialization phase. 
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7.3 Access Control List Initialization ay 
eatts-6%s 
A common problem associated with Access Control List ee 
mechanisms is the effort required for proper explicit (manual! ) re 
initialization. In practice, the ACL for a new object can often Cy 
be automatically predetermined based upon the type of the object, tea 
the creator, and the context in which the object is created idle {5 
(primarily the directory in which 1t 1s subsequently catalogued). Tak 
This 1s the premise upon which the Cronus Initial Access Control e 
List (IACL) mechanism is based. eat tel 
cai 
ae 
A list of type-specific IACLs may be associated with aint 
selected Cronus objects, currently Principal] and Directory hate 
objects. The IACLS are manipulated using the standard ACL itt 
manipulation operations (ReadACL, AddToACL, RemoveFromaCL), : e. 
distinguished by an optional kev denoting the type with which the ANS 
IACL 1s to be associated. The IACL mechanism also supports the KANN 
Cronus type hierarchy. the IACL associated with an ancestor in Sonn 
the type hierarchy will be used if a more specific IACL for the paint 
type itself has not been specified. TA 
; ® 
Laer 
Cronus Create operations incorporate the fol)owing algorithm a } 
for initializing the ACL of newly-created objects. . 


Poe agate’ 
1) A list of “IACL hints” (UIDs of objects potentially staat 
having IACLs associated with them) are searched in order attest 
for an IACL pertaining to the type of the object being * 
created. The first one found is used. These hints ss 7 
usually reference the Cronus directory where the object tte 

ie 
will subsequently be catalogued. ete ¢ 
et Me 
2) If no JACL search is specified, or the hints fail to Ba rd 


yreld an appropriate IACL, the object for the Principal 
invoking the operation is queried as 1f it were included 
at the end of the hints list. 


3) If an IACL is still not found, the invoking Principal is 
given all rights to the object. 


There are user commands for setting up, examining and 
modifying the initial access control lists retained with cronus 
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7.4 Authentication Manager 


The Authentication Manager defines and maintains two types 
ef abstract Cronus objects: CT_Principal and CT_Group. Like 


Ry other system objects, the CT_Principa!l and CT_Group identifier 

ar objects have symbolic names for convenient human access. 

yy Principals are symbolically named from a private name space 

at maintained by the Authentication Manager. which ensures their 

a uniqueness across the entire system. Symbolic group identifiers 
can be placed anywhere in the Cronus catalog, at the convenience 

we of the creating user. 

ty 

wt Operations on objects of type CT_Princ:ipal] and of type 

‘ CT_Group are controlled by access contro] lists. By convention, 


any legitimate principal can create a new CT_Group object. but 
only administratively authorized principals can create a new 

at principal. When the svstem 1s initialized, it contains at least 
4 one pre-defined principal, which is authorized to create other 
principals. 


In the following sections we discuss the design ofthe 
objects and operations supported by the Authentication Manager. 
si Section 7.8 discusses how to make the functions of the 
t 
ry Authentication Manager survivable. 


7.5 Objects Related to Authorization 


‘a! The object of type CT_Authentication_Data is the user data 
K) base consisting of the records for system users and for groups of 
at principals which have been defined in the system. 


The object of type CT_Principal is the permanent data base * 








A entry that Cronus maintains for each legitimate user. It is the 

a repository for such user-specific data as default priority and 

ya 

a other parameters associated with resource management; default 

“9 modes of behavior (e.g. default working directory); and 

% authorization data. It 1s expected that new kinds of data wil] 

a be added to the principal objects from time to time. 
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A CT_Principal object can be expected to contain the a 
ited 


following data: 


. went OH 
mia 


ae vt 
Principal unique-identifier (PID) RY 
r witeate 


Symbolic name of principal 

Access contro] list 

Encrypted password 

Direct group memberships 

Direct group memberships to be expanded on Login 
Range of priority service authorized 

Default priority 

Name of default initial subsystem 

Name of home directory for the principal ... (other 
user-specific data) 


oooo0o0o0gcg00cod © 


The priority data will be used in resource management 
functions. The default subsystem is the program automatically 
invoked following login. A home directory is a directory 
assigned to the principal that serves as the initial current 
directory for catalog accesses. in particular, it contains 
additional user initialization data. 


Groups (objects of type CT_Group) gather a number of 
identities for purposes of collectively granting them rights to 
objects and operations. Anv user can create a new group. and 
place anv other principal or group in it. This group can then be 
placed on an ACL. The access contro] list for the group object 
controls modification of the group definition. 


A CT_Group object contains at least the following data: 


° GID for the group 

Name of the group 

° GIDs of the groups of which the group is directly a 
member 

° IDs of principals (PIDs) and groups (GIDs) that are 
direct members of the group 


lo] 


There are a few special group identifiers. One of these 
(group world) represents the set of principal] identifiers without 
actually enumerating them anywhere. This group identifier is 
automatically appended to every AGS computation. Another special 
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group “Wheel” represents an access contro] override capability ma 
used for system maintenance, implicitly receiving al) rights to weary 
all Cronus objects. Admission to this group 1s carefully 
controlled. 
A convention has been adopted which effectively supports 
whee] capability only for objects of a specified type. A process 
whose principal ID matches the PID of the manager process 15 
automatically granted al! rights to all objects managed by that 
manager. This 1s useful in handling peer managers. As an 
example, 41] file managers are bound to a@ special] file manager 
principal, and implicitly have all access to all files managed by 
peer file managers. 


7.6 Operations on Authorization Related Objects 


The generic operations to create and remove objects, and to 
examine and modify the object descriptor. ACL, and object status 
apply to instances of CT_Princ)pa] and CT_Group. 


The following operation (see Cronus User's Manual 
auth_data(3)) 1s used during login to establish the binding of 
the user to the principal UID: 


Authenticate_As Heats 


The following operations allow processes to control] the 
identities applicable to an authenticated process (see Cronus 
User's Manual auth_data(3)). They effect only a single process, 
which may be either the invoking process or another process 
authenticated to the same principal. 


ee 





Enable_Access_Group 
Disable_Access_Group 


soe 
ce XI 











® 
The following operations maintain and interrogate the Bea AY, 
objects of type CT_Principal (Cronus User's Manual principal(3)). net 


ee 
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ie 
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Lookup_Principal 
Show_Group_Memberships 
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Add_to_Default_Group_Expansion_List 
Delete_from_Default_Group_Expansion_List 
Change_Password 


The rest of the data in the principal entry in the user data 
base is treated as part of the object descriptor. The generic 
operations which manipulate the object descriptor are used to 
examine and set these fields. 


The following operations are used to inspect and maintain 
the group identifier objects (Cronus User's Manual group(3)): 


Add_to_Group 
Remove_from_Group 
Show_Group_Members 


The rest of the data in objects of type CT_Group is 
contained in the process descriptor and 1s maintained using the 
generic operations defined on object descriptors. 


The access control list of anv object, including objects of 
type CT_Group and CT_Principal, can be set using the generic 
operations on access control lists (see Cronus User's Manua} 
object(3)). 


7.7 Operation of the Access Control Authorization Function 


Cronus access contro! checks the current identity of the 
accessing agent against access control lists maintained by the 
service provider. A process 1s authenticated in a way which 
binds the process UID to a set of external identities defining 
the authorizations of the process. These identities, the AGS, 
are available to any service-providing process. This section 
discusses the authorization function which 1s part of the service 
provider. 


In general, the access control steps within an object 
proceed as follows: 
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1. The request 1s parsed to determine the originating ton 

process UID and the operation/object requested. The ERR 

process_UID 1s trusted because it 1s added to the message SET 

| by the operation switch. Universal public privilege for ae 
| the operation to al] objects managed by the manager is ee 
first checked, to see if the specific access check is i 
ee 

needed. nae 


ay 4 
2. <A manager-based cache of process/object authorization a 


pairs for the process_UID 1s checked for a valid current 
entry. 


3. If there 1s no corresponding cache entry, the accessing 
agent's AGS is obtained. This data 1s also cached but on 
a per-host basis by the AGS cache manager. If present on 
the host. this cache manager provides a high performance 
interface to the Authentication_Bindings_Of function. 
There 1s @ broadcast~based protocol for alterting AGS 
cache managers to entries that should be purged. If an 
AGS cache manager does not run on a host. managers 
execute the Authentication_Bindings_Of operation 
directly, and the AGS 1s not cached [The per host AGS 
caching 1s not vet designed or implemented. } 


4. The access control software computes a new 
process_UID/object authorization entry using the AGS and 
the access contro! list maintained with the protected 
object/operation The process_UID authorization entry 1s 
then put in the manager cache. 


5. The process UID object authorization 1s used to verify 


permission. If authorized, the operation is passed on to 
the operation code. If unauthorized, the request is 
rejected. 

6. To allow for the enabling of new access groups, steps 3-5 


are repeated in the event that cached AGS fails. 


The permission authorization function 1s accomplished by a 
set of routines and data structures that we call the “gatekeeper” 
because of its role as protector of the objects/operations. 
Gatekeeper functions can be invoked as part of the procedures for 
receipt of a message, or called directly from the host process. 
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Access contro] can be applied to operations on the object 
set supported by the receiving manager process. or on operations 
defined by the receiving service. There is a fixed maximum 
number of access control rights maintained by the gatekeeper 
software (currently 32) for any object. These rights are 
represented as positions in a bit vector associated with both the 
identity it authorizes (principal identifier or group identifier) 
and the object it controls. 


7.8 Host Registration 


The lack of physical security for various parts of the 
system presents problems for the access control subsystem. Since 
the network cable mav be accessible to tampering, the network 
might be tapped. An outsider could then inject or inspect 
packets under an assumed network address. A workstation might 
pose as the site of a trusted manager. We can use administrative 
authorization to alleviate these problems. 


Encryption of all local network traffic 1s a form of 
authorization lt can remove the threat of tapping for either 
listening for or insertion of packets. Providing the host with 
the encryption/decryption key 1s administrative authorization to 
participate in the Cronus cluster. If a host can communicate at 
all, it can be considered an authorized host. Because 
encryption/decryption 1s isolated in the communication interface, 
it can be added transparently at anv time. While communication 
encryption can be thought of as part of the Cronus design, it 
will not be part of the initial implementation. 


Since workstations may be treated specially for some access 
control decisions, system configuration registry could be the 
source of such identification. In addition, the undesirability 
of tightly controlling responses to broadcast Locate operations, 
makes the registry useful in determining the authenticity of the 
respondee. A configuration registry enumerates all] of the 
authorized system hosts, and the system services (Cronus 
functions) which they have been authorized to run. 


One secure way to make the registry service available is to 
support it on one (or more) well-known Cronus hosts (i.e. hosts 
at a well-known internet addresses, say host No. 1, ...). The 
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configuration data can then be obtained with an Invoke On Host to eutne 

the well-known hosts using the logical name for the service(13). attatty, 

| The cluster configuration service would support the following . = 

| functions: Bani 
Bene 

Show_Configuration_Hosts een 4! 

Set_Configuration_Hosts Re 

estate, 

e 

Standard access controls apply, with Show_Configuration_Hosts PRY 

being universally allowed, while Set_Configuration_Hosts limited stele! 

to @ system administration group. teat 

atte 

Nett 


e 

7.9 Survivable Authorization Design ah Ph 
seein 

7.9.1 Objectives , seat 
nitetiat 

sata ety ty! 

The authentication function and evaluation of the current AGS are yun ve 
critical parts of the operation of Cronus. These functions must = a 
be available at all times or Cronus cannot operate effectively. rattan 
Our objectives in providing survivab)i)ity in Authentication are: matte 
7 i) 

Ro 

'6* Healy 

a. A Cronus user should, under reasonable failure patterns, eleatte 






alwavs be able to gain access to the system. — 

9 ait 

b. The current value of the process-AGS binding should be aieetiatiy 
available whenever a process 1s able to request services eta! 
from object managers. ne 


c. A less important but desireable objective is that a 
client be able to continue to perform maintenance 
operations on the principal and group objects despite 
failures of hosts supporting these functions. 











To meet objectives (a) and (c), we must replicate the 
Authentication function. To meet objective (b), we must maintain 


(13). Since this function is often used to determine’ the 
veracity of responses to the Locate operations, it can not safely 
use Locate to find out where configuration managers are running. 
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the bindings in a replicated fashion, or keep them close ‘to the 
process to which they refer, so that the bindings are available 
when the process makes requests of other Cronus managers. 


7.9.2 Observations 


The authentication function is a global DOS function supported on 
a GCE which 1s expected to be up most of the time. Becuuse these 
services are simple, the host hardware and software should be 
stable. increasing its availability. Since the GCE is relatively 
Inexpensive, it 1s also feasible to stock a spare. 


The authentication function is based on maintaining two related 
types of objects. The data bases which the Authentication 
Manager maintains to support the principal and group objects are 
not large. The principal data base 1s estimated to be no larger 
than 1000 users, with an average entry having around 1000 bytes 
of data. The group data base might have 2000 entries, averaging 
300 bytes of data. This 18 less than 2 MBytes of data. and can 
easilv be accommodated on a GCE. 


The processing demand on Authentication managers 1s not expected 


to be large. Aside from initial authentication and group 
expansion, which occurs typically once per user per session, 
other operations are infrequent. New users and troups are 


occasionally created and the associated data bass occasionally 
displaved and updated. A single GCE appears easily capable of 
handling anticipated processing requests. 


Performance and size considerations do not seem to require more 
than a single GCE per cluster. Survivability is the primary 
motivation for replicating the authentication manager. Our 
approach 1s to maintain completely replicated data bases on two 
or more GCEs. 


Of the operations performed by the Authentication Manager, the 
one of most concern for survivab)lity 1s Authenticate_As, which 
1s a read-only function. This is also true of a number of other 
AM operations (Lookup Principal, Show Groups Expanded, etc.). 
Synchronization of multiple authentication managers is not 
required to complete these operations. 
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Some AM operations do modify the authentication data (e.g. Create etgettas, 
new principal, Modify User Parameters, etc.). These require reqetagtt 
synchronization among Authentication Managers for consistency. ia 
However, because these operations are relatively infrequent and 
have simple semantics, a simple approach to synchronization which 
ignores maximizing concurrency will suffice. We designate a 
primary Authentication Manager as a single point of 
synchronization. This method is backed up by an alternate 
procedure if the primary site is inaccessible. A complete 
description of our approach follows in the next section. 













In the current implementation. each process has a process manager 
on the same host. The process-AGS bindings are maintained by the 
process manager in the process descriptors for these processes. 
During host outages when & manager is inaccessible, so too will 
be the process it manages. There is no need to maintain the 
process-—AGS binding any more reliably than we maintain the 
process reliability. As some later point, we wil] address issues ‘ 
of process survivability. We can then naturally think in terms of 
replication of process descriptor data (including the current 
AGS) as part of the reliable process concept. and need not 
address it separately. 






















7.9.3 Approach 


Fully redundant copies of the authentication data bases are 
maintained at more than one Cronus host. This means that, 


ignoring synchronization, an operation can be completed at any Roe 
site which maintains the data base. We expect that two ere 
operational authentication sites will provide sufficient ~S 
availability for most applications of Cronus. mrlk 


A spare GCE could be integrated into the system if one of the 
dedicated hosts needs to be taken off-line for any extended 
period. This minimizes the time during which there may only be a wens 
single Authentication site functioning. The new host integration 
protocol first involves transmission of all of the existing 
objects. When the object transmission is complete, the new 
manager retrieves the change log and incorporates any updates. 
The final step before assuming operational status is to : 
coordinate with any on-going activities. 


aN 


- 


























7 =~ ~ "ae 
Ny 

sige sia Deeg pie baronenn alien ete te agian a ata ina ae ne abe idee ee oe gd 
Ree Re Or ONE ROR NUN NCR RENN RUNTE 
SOHO US AGO Sn aA AOI AR MACY AR net vino ated nae Ona a 


3 a coely od aot te tel bal ag Bet wet Wat tee te te Oat te ta Oa 6a" sta ig ate” ota” fe Ms ata ath at wu wu ee PIvceLPUnRUMRUMURUM UNM UND Brat 
sin Melt ca Seka as a] 
tet et, 

te 

stats 

® 


ates 






































were 
Each operation on authentication data objects 1s an independent Matec 
transaction. so that there is no linkage between any two ratatgtetyi 
operations. The operations either reference the identified a 2 
objects (read operations) or modify the identified objects (write ai 
operations). Read operations require no synchronization or ean 
concurrency contro} between Authentication Managers. Any Read CM 
operation can be handled by any available authentication manager. AKER 
Some read operations have side effects which do change the state ene 
of other system variables (e.g. Authenticate_As modifies the e 
current process AGS in its process descriptor) but these are ie Y 
r1dempotent operations so repeating them at distinct sites as part pati 
of error recovery 1s not harmful. ton 

\) a’ 
Write operations. on the other hand. require synchronization ata’ 
among the Authentication managers to preserve the consistency of 2 
the data with respect to concurrent updates. To do this one AM ane 
1s chosen as the primary site. The designation of which AM is ptt atty 
primary 1s found in the configuraticn data base for the system. page " 
Clients as well as other AM processes can consult this data base "—y ne 
to find the primary site. The primary site remembers its role itt 
and will respond to broadcast request to identify itself in case _ _@ 
the configuration file is inaccessible. sat 

Hear tt 
All Write operations are initiated with the Primary AM, which nhs 
serializes the modifications to the database. The primary AM DORR 
records the modification in @ change log by appending a change pattattity 
record to a multi-copy reliable file. After logging the request, Pome 
it updates it own data base, and informs other operational] AMs of Raise 
the change. lf a@))} AMs are running, the data bases are again “ne 
synchronized after each one incorporates the update. When an AM ete 
ls restarted, 1t processes the change log to incorporate changes Nee 


made to the data base in its absence before it will accept new cea 
requests. Multi-copy files are used for change logs to avoid 





single host farlure reintegration dependencies. rat 
La ol) 
% 
nes 
This approach raises two issues. perta! 
we ee 
a. What. if anything, should we do about read/write ree 


synchronization for read operations that may be processed 
by a non-primary AM while the corresponding object is 
undergoing modification by the Primary AM? 


b. What, 1f anything, should we do when a modification is 
requested and the primary AM 1s inaccessible? 
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To answer question (a) we first observe that not only is the data Pratt on 
changed infrequently, but much of it is particular to a single ragittaytit 
Cronus user, and hence concurrent read and write access is quite P_.. 
unlikely. Furthermore an old copy of just modified data is chee KN 
almost never harmful. The behavior is similar to a race aa 
condition between independent accesses to a single copy data ehiees 4 
base. Thus our approach to Read/Write svnchron:zation is to do a pate 
nothing. af AOS 
n.d 
There are many possible answers to question (b). One approach 1s Sag! 
to do nothing. and reject these operations temporarily unt] the en x 
\ primary AM 1s brought back on-line. Since modifications to SH 
authentication data are not critical to the operation of the Seis 
system. the major effect of this 1S inconvenience because we wil] paataeetaat 
need to repeat the operations at a later time. A simple ae e.. 
mechanism which avoids this uses the lock on the change log file one 4 
as a tool for serializing updates from any of the available AMs. ates 
In this scheme. when the primary AM 1s inaccessible, any AM can ny 
initiate the update if it can first lock the change log. It then Metta 
infcrms the other operational AMs of the change When the ate stan’, 
primary comes back. it integrates the changes 1t has missed @ 
before assuming primary update responsibility again. : nants 
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‘i 8 Cronus File System Wy 

y 5 

i 8.1 File System Overview 

‘ 

4 

BR Cronus supports a number of different kinds of files, 

i including. 

uy 

" oO Primal files. 


Ny The primal file 1s the most basic kind of Cronus file. 

‘ Other kinds of Cronus files are implemented from prima) 
x) files. A primal file is stored entirely within a@ single 
" host. and 1s bound to the host. 

t 


fo) Reliable files. 


A reliable file is 1smplemented by one or more primal 


files. Each primal file used to implement a reliable 

file contains all of the file date. The reliability of 
‘ these files derives from the fact that the file 1s 

accessible as jong as at jeast one of the primal files 
x) that implement it 1s. 


° Dispersed files. 


(\ 
‘ A dispersed file 1s implemented by one or more prima} 
files. A dispersed file 18 one whose contents may be 
By distributed over more than one host. Each of the primal 
A files used to implement a dispersed file contains part of 
Ky the contents. 
The initial Cronus implementation (the “primal system") 
supports only primal files, which are implemented upon underlying eK 
4 single-host file systems. The next major Cronus release (the ty 
{ “reliable svstem") will support reliable files. Later system Catt! 
% releases mav support dispersed files. Hat 
Xe att 
t This section also describes a single host file system, v 
called the Elementary File System. which wil] be developed for WAR 
‘, each Cronus file host to serve as a common base of implementation 
i support for Cronus file managers. 
bs 
x 
~ 
ae M 
R -89- So Me 
; we 
! atts es, 
‘ es! 
x 5 oy 
ae pee eee piel i ee egage fa a a ag AIO eT nS r FO yh ay ’ i 
Tatra AR RE SHER tase! PPL SEY seeks Sh ie rhe Ben cannee " anti ies ne no wate oy me 
‘ 4 “, ! f wat as 1% ae ; " 
SOIN TRH RREMS ESL K y at AA a ‘ Ne RQ eaR AN ASX rene Gh ie he OR AA Na aaa ta, Wraaetay wht 


é Wega so tas é ery Tyr a 
PRTC UN TROL T RTO RT CRT URTUR SUR TLCAN UN ORAL A°s'040'o¢p- ate ate oe tetate ate “ata at Mat Mek, Sat, MO WRN MOY 


| en Md 


“ee we a 


Lup ean 


‘= 


Primal files are Cronus objects. They have unique 
identifiers (UIDs), and may be given symbolic names. There is 4 
Cronus object type CT_Primal_File. 


4, 


Executable programs will be stored as files of type 
CT_Executable_File which 1s a subtype of primal File. There will 
be many different kinds of hosts in Cronus, and an executable 
program file which can run on one host type will usually not be 


ae me: 


—~ 





, able to run on another. In addition to the normal descriptive 
information, files of this type have information that specifies 
where they can be run. The additional information maintained for 
: an executable file would include. 
a 
‘ © The type of processor required to execute the program 
‘ stored in the file. 
b ° The run-time environment required by the program 
” including the local operating system and necessary 
: peripheral] devices. 
t 
t 
t 
q 8.2 Cronus Primal Files 
‘ 
t 
: 8.2.1 Cronus Primal Files 
‘ 
; Primal files cannot be moved from one host to another, the 
; primal file svstem is partitioned among hosts that store primal 
4 files. The HostNumber component of the UID for a primal file 
x always specifies the host on which the file 1s stored. A copy of 
Ki & primal file can be created on another host, and the original 
it can be deleted. The copy is a different primal file with a 
y different UID, it just happens to contain the same data as the 
- original file. 
i) 
r Like other Cronus objects, primal files are accessible to 
My processes by means of the interprocess communication and 
X operation switch (Section 6). There 1s a Primal File Manager 
, process on each host that stores part of the primal file system. 
A client process accesses a primal file by invoking an operation 
he on the file, 1n which the UID for the file and the operation to 
n be performed on the file are specified. 
4 
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The Primal File Manager that maintains a primal file also 
defines a mapping between the UID for the primal file and the 


information required to manage the file. The collection of 
information necessary to manage a prima! file is called its 
descriptor. The file descriptor includes. 


UID of the creator; 

Date and time of creation, 

Date and time of last write. 

Access control list (ACL) for the file: 

Information necessary to find the file date on 
the storage media, 

Current size of the file, 

Other information (to be specified as needed) 
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Most of the operations provided bv conventionai file systems 
(create. read. write, etc.) are implemented for Cronus prima} 


¥ 
t 
x 


files The design is discussed in terms of the norma! jife cycle = 
of a primal file which includes. tut 
1. The file 1s created. 
2. Data in the file mav be read or written by a client. 
3. Information in the file descriptor may be read or written 
by a client. 
4 The right to access the file may be granted to or revoked 


from other users. 
5. The file may be deleted. 


File creation involves: the generation of a UID; the 
creation and initialization of a descriptor for the file; and the 
binding of the UID and the file descriptor in the Primal File UID 
Table Until data is written into the file, the file 1s empty. 
When a primal file 1s created by a Primal File Manager, it 1s 
created on that manager s host. 


There 1s an issue regarding whether it should be necessary 
to open a primal file before reading or writing file data. One 
reason for “open” and “close” 1s to provide for reader-writer 
synchronization, another 1s optimization of read/write 
operations The disadvantage 1s that open/close add complexity 
to the Frimal File Manager because 1t must maintain state 
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information for open files and dea] with the problem of files 
opened which are never explicitly closed (e.g., because the 
client's host has crashed). Furthermore, if we require open and 
close. additional operations must be invoked on the file even 
when the read or write is for a smal! amount of data. 


The Primal File Manager supports access to files without 
open and provides an open/close facility for clients that need 
it. A read or write without open 1s called 4 “free read” or a 
“free write’. The client may then choose whether the additional 
overhead of opening and closing the file 18 worthwhile. For 
example. 1f we wish to write a simple log message when a process 
ls initiated, we would probably choose the free write. If, on 
the other hand, we were copving a file. we would probably choose 
to open the files, incurring the overhead of initiation once, and 
gaining further system support for svnchronization and data 
integrity. <A client process mav read or write data in a primal 
file (subject to authorization considerations) without opening 
it. unless another process has opened the file in such a way that 
free reads and writes are forbidden. 


Free reads and writes are synchronized in the sense that 
multiple reads and writes are serializable. This means that the 
File Manager will. in effect. perform each read or write 
operation in 1ts entirety before performing another operation. 


When a file 1s opened. two parameters specify the access 


state requested. One specifies either Read or ReadWrite access. yan: 
The second specifies the type of reader-writer synchronization pa oe 
desired. There are two types of synchronization supported. “Wr 


“frozen” which permits either N readers or a single writer. and 
“thawed” which permits any number of simultaneous writers and 
readers. When a file is opened with “thawed” access, readers of 
the file see updates made by writers of the file. Opening a file 
with “thawed” access prevents other processes from opening it 
“frozen” 
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Thus, the access states defined for a file are: 


free, 

frozen read open; 

frozen readwrite open, 
thawed open; 

(free) read in progress; 
(free) write in progress. 


iigiety’ vt 


A file may be opened so long as the access state requested ae 


does not conflict with the current access state of the file. 
Table 6.1 defines the compatibility of the access states with one 
another, and with read and write operations invoked by a client 
without previously opening the file. An OK for an (OPERATION, 
ACCESS STATE) entry in the table means that a client process can 
perform the operation on a file when the file 1s in the 
corresponding access state, & NO entry means that the operation 
will fail when the file 1s in the corresponding state: a DELAY 
operation means that the operation wil] be delaved unt:] the 
operation in progress (and any others that may be queued) are 
completed. 


The data in a primal file 1s a sequence of octets, numbered 
from 0 to N. The read operation specifies the first octet to be 
read and the number of octets to be read. The write operation 
specifies the octet position of the first octet to be written and 
N octets of data to be written. 


In order to support file system recovery, data that 1s we 
written to a file that has been opened for (ReadWrite, Frozen) ie 
access does not become part of the permanent file data until the Ld 
file 1s closed. It 18 possible to close a file opened for AAT 
(ReadWrite. Frozen) access 1n @ way that sborts writes made to 
the file while it was open. 


A file 1S open to @ process. The Primal File Manager 
provides an operation which returns a list of the UIDs for the 
processes, if any. that have a given file open. Another 
operation returns a list of the UIDs for the files, if any, that 
a given process has open. 


When a process 18 destroyed with files open. the files are 
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: ACCESS STATE tonite 
free frozen frozen thawed read ih write in anaet 
k read readwrite progress progress Fie 
f OPERATION “ ny 
' Ae ny 
' frozen tate. 
read Ok Ok NO NO Ok DELAY : e 
; open Nita 
‘ Wate 
: frozen Maatinatie 
; readwrite Ok NO NO NO DELAY DELAY AAR 
open SWS 
a. 
; thawed Ok NO NO OK DELAY DELAY Phen y 
; open ® ey 
: met ny 
‘ free Ok OF: NO Ok OF: DELAY f Hatt 
: read fr ater 
e 
free OF NO NO Ok DELAY DELAY i ene 
: write Pet ae 
d Table 8.1 Access State Compatibility Sa 
Sy 
MK 
closed and any writes to (ReadWrite, Frozen) open files are ae 
' aborted The norma) close operation may only be invoked by the Reh! 
} process that opened the file An alternate close operation can me nt 
be used bv other processes to close & file during cleanup. f et 
ts + 
. ON 
A client can read the descriptor of a primal file. Some of an mately: 
the information in the file descriptor 1s changed as a side e 
effect of operations on the file. For example, when a file is money 
4 written. the date and time of last write 1s changed. There is mae as 
other information that the client may wish to change explicitly. wena 
£! 
neatatytal 
Access to a primal file 1s controlled bv its access control sie we 
list (ACL). Access to a primal file may be granted to other oe. 
users bv adding entries to the ACL. Similarly, access to a file LOW Sea 
. may be revoked from a user by removing the corresponding entry fe Po 
from the ACL. wa uO 
Some file svstem support the notion of Delete, UnDelete and SRO 
® 
ete 
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Expunge operations. The current design for the primal file 
system assumes that only Delete (called Remove) will] be 
supported, but it 1s relatively straightforward to modify the 
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specification of Cronus primal files to accommodate a Delete, pee a 

Undelete, and Expunge mode! of file removal. oy is 
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8.2.2 Crash Recovery Properties ed, 

Prva } 

ae 

If a primal file operation 1s invoked. the Primal File oe \ 

Manager normally acknowledges the operation, indicating the NS 
disposition of the operation (e.g., success, failure, and reason) 


and. depending upon the operation, to return anv data requested. 


The Prima!) File Manager does not acknowledge write requests 
until the data has been written to non-volatile storage. A 
client process can be sure that the data has been written when 
the acknowledgement is received. even 1f the Primal File Manager 
or its host should crash shortly afterward 


Primal File write operations are atomic with respect to host 
crashes. That 1s, 1f the Primal File Manager host should crash 
during a write operation, after the host and Primal File Manager 
have been restarted and the Primal File Manager has performed its 
recovery procedures, the write operation wil) have either 
occurred in its entirety or no part of it will have occurred. If 
the crash occurs after the data has been safely written but 
before the acknowledgement has been sent, the acknowledgement 
will never be generated. 


This atomicity property is true for the Close-and-— 
RetainWrites operation. That 1s. exrther none or a}1 of the 
writes made while the file was open will have been performed. 
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it 8.2.3 Operations for Objects of type CT_Prima)_File auatat 
va Aastae! 
2,2 
In addition to the generic operations (Cronus User’s Manual] errr} 
v4 object(3)) the following operations are supported for primal stints, 
ve] files. ‘ ae 
it é gayi 
At pen 
‘ ie 
at Close pete ntt 
a e 
. Sync aan, 
af Read Hertel! 
Ve Write atateits 
‘ fatale 
Ki Truncate mane 
" Append ON 
RX FilesOpenBy Mpittogtt: 
OpenStatusOf eee 
: CloseProcessOpenFile “Ay 
a CloseAl1]ProcessOpenFiles ee “y 
% ee 
Ppt | 7 
ni The Open and Close operations provide an atomic transaction et 4 
sf capability for a single primal file. At some later point, we may \ 
define explicit BeginTransacition. EndTransaction. and _® ‘ 
4 AddToTransaction operations which could be used to provide a s 
A capability for transactions that involve more than a single ts et 
‘f Wie 
r primal file. ange 
; pred 
‘Mt In response to a Status operation, the Primal File Manager NA 
returns information about the status of the primal files it ae 
‘ manages (Cronus User's Manual p_filesys(3)). such as the amount San 
4 of free space. the amount of space used by existing files, the TON 
) number of files 1t manages. the number of files currently opened, Rata 
4 etc. This information will be useful to system operations " tk 
w personne] as well as to clients who might use it when deciding Of 
where to create primal files. e 
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8.3.1 Objectives 


The principal motivation within Cronus for maintaining 
multiple copies of a file derives from reliability 
considerations. The objective 1s to increase the probability 
that the file will be available for access at any given time by 
keeping copies (in Cronus we shal! cal] them images) of the file 
at a number of hosts. Although any given host that stores the 
file mav fail. so long as at least one of the hosts maintaining 
an image 1S accessible, the file wil] be also. 


Secondary benefits include performance improvements that may 
result from distributing the load due to file access among the 
hosts that store the file and from the possibility that client 
access to an image of the file maintained on 1ts own host will be 
more responsive than access to &n image on @ remote host. 


Increased file availability does not come for free. The 
cost 1s increased complexity 1n managing the files. Most of the 
complexity 1S 4 consequence of the fact that Cronus works to 
ensure the mutual consistency of the file 1mages, when one image 
of the file changes. 411 others should be updated to reflect the 
change 


Furthermore. in the Cronus environment it is desirable to 
support concurrent access to files. For example, Cronus supports 
a form of multiple readers / single writer concurrency control 
for primal files The same sort of concurrency control] 1s 
provided for multi-image files. 


Concurrency control] requires that sites managing images of a 
file cooperate to synchronize client access to the file. There 
1s complexity and overhead associated with this cooperation. In 
addition. since strong concurrency control mechanisms require the 
participation of more than one site. situations may arise where 
an insufficient number of file image sites are accessible to 
perform the concurency control]. Unless the system is willing to 
permit unsynchronized access to an accessible file image in such 
situations, some of the reliability benefits of multi-image files 
will be lost. The danger of unsynchronized access 1s, of course, 
that accessors may cause different images of a file to become 
inconsistent 


The Cronus approach to concurrency control for reliable 
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files 1s based on the presumption that file availability 1s 
important enough that it 1s permissible to risk the consistency 
of file images and to grant access to file data when 
synchronization cannot be achieved. That is, when a choice must 
be made. file availability or survivability is considered more 
important than mutual consistency of file images. 


The approach to concurrency contro] is to try to achieve 
strong synchronization prior to file access in order to maintain 
the consistency of the file images However, should the 
synchronization fail because the file sites required to achieve 
1t are ineccessible, the client will be informed and access to 


the file will be permitted only if the client gives explicit 
consent to continue. . 


This relaxed approach to concurrency contro} w)1l be 
practical only 1f. 


a. File access patterns are such that it is relatively 
unusué] for multiple concurrent updates to occur. 


b. Hosts are reasonably reliable so that host failures that 
prevent strong synchonization are relatively rare. 


ec. There 1s a simple and inexpensive way to detect 
inconsistent images of a file. We believe that the 
Version Vector mechanism developed at UCLA [Parker 1983] 
18 @ good one for this purpose. 


Experience with Cronus mav show that there are some 
applications which require more absolute synchronization than 
this approach supports. If that proves to be the case, the 
support for reliable files wil] be augmented to include a file 
tvpe for which more positive synchronization 1s supported. 


8.3.2 Reliable Files as Composite Objects 


A reliable file is a Cronus object of type, 
CT_Reliable_File A Cronus Reliable File (RF) is a collection of 
one or more prima) files. each of which represents an image of 
the reliable file. No two images of a4 reliable file are stored 
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The number of images of a reirable file may change over the 
lifetime of the file, as may the sites which maintain the 
individual images. The desired number of images 18 called the 
cardinality of the file. The actual number of file images may be 
different than the file cardinality. For example, when a file is 
first created 1ts cardinality wil} be greater than the number of 
Images unti] all of the images are created. Similarly, if the 
cardinality of @ file is changed, it takes finite amount time for 
the number of images to be adjusted. Thus. the cardinality is 
properly thought of as an objective. 


A reliable file of cardinality = 1 1s @ migratory file. 
Although it has only a single image like a primal file, unlike a 
primal file 1t mav be moved from one host to another. 


Each Reliable File Manager (RFM) maintains a UID table for 
the reliable files that 1t manages. Unlike simpler objects, such 
as primal files, the management of reliable files requires the 
cooperation of RFMs. Each RFM participates in the management of 
a collection of reliable files (the ones in its UID table), but 
not all RFMs participate in the management of a)1] reliable files. 


Depending on the cardinality of & particular reliable file, 
a RFM may need to cooperate with O (cardinality = 1). 1 
(cardinality = 2), or more (cardinality > 2) other RFMs. For 
each reliable file 1t manages. a RFM 1s directly responsible for 
carrying out the operations on a particular prima) file that 
represents an image of the file. We shal! sometimes refer to 
that image as the manager's image or as the local (to the 
manager) image 


When a client invokes an operation on a file, the underlying 
interprocess communication facility routes the operation to an 
RFM capable of performing it. Anv interactions among RFMs that 
are required to perform the operation are transparent to the 
client process 


Access to the primal files that comprise a reliable files is 
limited to RFMs No other process may directly access a primal 
file used to implement a reliable file. even 1f the process has 
the UID for the primal file. this 19s enforced by the Cronus 
access control] mechanism 
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For Cronus, RFMs reside only on sites that also have prima) Patil, 
files managers (PFMs). The manager's image of the file 1s stored NAG 
at the manager's site. RFMs, of course, access the file images 
through PFMs in the normal fashion. 


There is an issue regarding the relation of RFMs to PFMs. 





They could be implemented either as two completely separate ey 
managers which communicate by means of interprocess communication Pooen 
or as & Single, combined manager for both CT_Primal_File and e 
CT_Reliable_File. The initial implementation of reliable files Sit 
will be accomplished by means of RFMs that are separate from the pany! 
PFMs Later implementations may integrate the RFM functions into ER 
(some of) the PFMs. Ruel 
Re 
In addition to the information maintained in descriptors for = Ld 
primal files, object descriptors for reliable files contain the re fest, 
following information. iat 
ae 
File Cardinality. Sana 
ID of primary site (see below). see vie 
Version vector for the local image of the file eo 
(see below). Raentes 
Version vector for the loca] image of the 4 at 
descriptor (see below). Nat 
List of UID's for the primal files that implement Das 1 
1mages of the file. sin 
Pore sy 
Patria tat 
ulate 
saat 


8.3.3 Synchronization Considerations 


In order to maintain the consistency of images of reliable 
files and the integrity of internal file date (for primal as wel] 
as reliable files), Cronus must contro] and synchronize the 
manner in which clients access the files. 


The general] Cronus approach to synchronization for reliable 
files can be characterized as a best effort approach consisting 


of the following steps. 


] trv to synchronize access. 
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if synchronization cannot be achieved permit access 
the client so desires. 


if 


3. be prepared to detect and deal with inconsistencies that 
may result from unsynchronized access later. 


A specific concurrency control mechanism must be chosen. 
Although much has be written about concurrency control and 
synchronization for multiple copy files and data bases, there 1s 
little practical experience on which to base a choice. We have 
decided to use a simple mechanism for Cronus. Should the 
mechanism prove to be inadequate (for example. because 1t cannot 
achieve svnchronization often enough, given the failure patterns 
observed in Cronus), 1t will be replaced with a more capable (and 
complex) one. 


Synchronization will be accomplished by means of a4 
Pprimaeryysecondary image approach. Each reliable file will have 
one primary image and one or more secondary images. Al} attempts 
to synchronize access to a reliable file will require 
synchronization with the primary image. We refer to the manager 
of the primary image as the primary manager for the file, 
managers of other images are called secondary managers. 


When a client attempts to access file data 1n a way that 
requires synchronization. an attempt will be made to synchronize 


with the primary image of the file. If the client's access 
attempt is initiated with the manager for the primary image, 
synchronization occurs as for primal files. If the access 


attempt is initiated with the manager for a secondary image of 
the file, the secondary manager interacts with the primary 
manager to gain the appropriate kind of access (non-exclusive 
read, exclusive write). 


RFMs use a locking discipline to support synchronization. 
This discipline works roughly as follows. When an attempt to 
open a file for reading 1s handled by a secondary manager, the 
manager tries to set its lock for the file to "reserved for 
reading”. The attempt to set the lock fails 1f the file 1s 
already locked for writing Next. the manager interacts with the 
primary manager to try to set the primary manager's lock for the 
file If this succeeds. the secondary manager sets its lock to 
“locked for reading” and proceeds with the open. If the primary 
hes the file locked for writing. the secondary manager clears its 
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lock and reports to the client that the file is busy. When the ’ 
file is closed, both the local lock and the primary manager's TA rit 
lock for the file are cleared. Attempts to open a file for 
writing are handled in an analogous fashion. This locking wren 

discipline is described in more detai] in the Cronus User's ASR 
Manual. ne 


The reliable file system supports the notion of free reads Reeae, 
and writes. For @ free read the synchronization outlined in e 
Table 8.1 1s performed by the file manager which handles the 
client's read. but no attempt to synchronize with the primary cr 
manager 1s made Free write operations require synchronization peat nh 
with the primary manager. ! 


If sychronization for anv operation fails because the 





primary manager cannot be reached. the operation may proceed, but ea: 
only with the explicit consent of the client, and. of course, at sehen 
some risk. The risk 1s that different images of the file may be i eats 
undergoing unsvnchronized access. and, as a result. the file aiateatis 
images may diverge into inconsistent states. areal, 

A client mav specify its intent with regard to on 
unsynchronized access when it initiates a file operation by means seattle, 
of an optional operation parameter. Alternatively. the client AN 
may choose not to specify the action to be taken when 1t invokes WANs 
the operation. in which case, if synchronization cannot be i tteatth 
achieved. the manager wil] ask whether 1t should proceed with or e 
abort the operetion. 

Inconsistent images of a file can be detected by means of ae 
the version vector mechanism developed at UCLA. A version vector gtd 
for a reliable file, RF, 1s a set of N ordered pairs, where N 1s win ay 
the number of sites at which RF 1s stored. A particular pair e 


(Si, Vi) counts the number of times updates to RF were initiated 
at Si. Thus. each time an update to RF originates at Si, Vi is 
incremented by one. The version vector 1s part of the object 
descriptor for RF. 





Two images of a@ reliable file are said to be consistent if 


® 
the modification history of one 1s the same as or 1S an Initial eater e, 
subsequence of that of the other. It can be shown that two unit 
images are consistent if one of the vectors 1s at least as large natty, 
as the other in every (Si. Vi) pair The larger vector 1s said raat wy 
to dominate the smaller. and the image corresponding to it aaah 
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represents a later, consistent version of the image corresponding 
to the smaller vector. If two vectors are such that neither 
dominates the other (that 1s, some pairs in one are larger than 
some pairs in the other and vice versa), then the corresponding 
file images are inconsistent with one another. 


Since the descriptor for a file may undergo modification 
independently of the file data, descriptors for reliable files 
also have version vectors. 


The question of when version vectors for file images should 
be compared and what to do if thev are not equal 1s discussed 1n 
Section 8.3.6. The synchronization mechanism for reliable files 


outlined here is described 1n more detail] in the Cronus User's 
Manue! . 


&.3.4 Interactions Among Reliable File Managers 


RFM s must interact with one another in order to maintain 
reliable files. For example, when a reliable file 1s updated, 
the new file data must be transmitted to each site that has an 
image of the file. 


Occassionally a RFM that must participate in such an 
interaction will be inaccessible. It 1S important that when, if 
ever. such a RFM becomes accessible the interaction occur. It 15s 
the responsibility of the inititiating RFM to ensure that the 


interaction occurs. The mechanism used by RFM's to do this 1S as 
follows. 


Each RFM maintains @ PendingActions data base which contains 
a record for each operation it was unable to completely perform 


due to its inability to interact with other RFM's. Each such 
record includes. 


the UID of the reliable file. 
@ specification of the action required to complete 
the operation, 
& list of the sites at which the action must be 
performed (for some actions. this list may be empty). 
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Whenever the RFM is unable to complete an operation, it adds °. 
a record to the PendingActions data base to describe the actions ARO M 
necessary to complete the operation. Subsequently, at regular 
intervals, the RFM scans the PendingActions data base and for pine 
each record, it attempts to perform the necessary interactions. A 
If the RFM succeeds in performing some, but not all, of the ria 
interactions, it updates the record. When al] of the bd 
interactions described by a record are successfully performed, 
the record 1s removed from the data base 
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The actions that may be found in records in the eon) 
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b. Update the descriptor for a file. 
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c Update a file itself. 
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When a RFM comes up for the first time. 1ts PendingActions reat 
data base 1s empty. and if sites and the network never failed the at Md 
data base would remain empty. pate 


ee ee 


The PendingActions data base should be stored in a Pa 
‘ reasonably reliable fashion. It 18 probably adequate to store it eae 
' as @ primal file on the RFM’s local site Hits 


fe 
Rae's 
8.3.5 Operations on Reliable Files wy 4 


The operations supported for primal files are also supported 
’ for reliable files. Three additional operations are supported Na Re 
for reliable files The Change_Cardinality operation changes the 
é cardinality of ao reliable file. The File_Sites operation “he 

produces a list of the sites that are thought to be maintaining 
images of the file. with the primary file site distinguishec. 

The Move_Image_To_Site operation moves a file image from one site 
to another (removing the image at the source site). 
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The design of reliable files 1s conveniently described in ates 
terms of the normal life cycle for a file. which is much the sare Hettaetsetst 
as that for a primal file. The principal exception 1s that the reeretécctt 
cardinality of the file may change. The life cycle includes: etieatit 

X) itatseetyt 

A ty Kt) AR 

a. The file is created. as 
‘ erty a iW 

1 ‘ 
b. Data in the file may be read by a client. ATA 
raat 
c Data in the file may be written by a client. ERR 
ERK WKY 

ey 

d Information in the file descriptor may be read by a RORY 
client. Hetittae 

e. Information in the file descriptor may be written by a e 
} aeearitana 
client. Bete 
fy alate! 
f. The cardinality of the file may be changed we abst 
ve : "s 


g The file may be deleted 


The following sections discuss these operations 


8.3.5 1 Creating Reliable Files 


A reliable file must be created before data can be written 
into 1t, and until] data is written into the file, the file 
remains empty. 


To create a reliable file. the client invokes the Create 
operation specifving the cardinality of the file as a parameter. 
The RFM that receives the Create operation becomes the primary 
manager for the file. 


For the initial implementation of reliable files, clients 
may exercise contro] only over where primary file images are 
maintained. If the Create operation 1s requested by means of 
InvokeOnHost. then the RFM at that host becomes the primary 
manoeger,. otherwise. the RFM selected by the interprocess 


-105- 


i a ate wv a On ae CORA O nore TaN mace 
pa ARR 
CRRA a Se oss MOR Suteiene SESSA rereratt ste, DSS Ns eB hee vn retRs olds ata. sens ota: ata * 


we wr, 
Atcha ate cael aiete 

































" 


Me iy t he } ’ 
«tetany 





) 
ening 


ee de nd ee i UNUM UM MN UN MU PWM HN 



























a 
t ; 

: ny 
J ny 
‘ me 

oD 
ny 
* 

d 
ey! a {) 
at athe 
nt a 
ta communication facility becomes the primary manager. Later ade 
" implementations may provide means for client processes (as wel] Wegthayt 
4 as for users through the user interface) to exercise control over 
be the initial placement of secondary images. After images are in re AN 
te place, the Move_Image_To_Site operation can be used to move an ¢ My 
e image from one site to another. Ate) 
te aeiath: 
“dt When a RFM receives a Create operation. it: wieatea 
t eet 

e 
nu: a. Creates a (empty) primal file for the primary image of PA 
i the reliable file, and obtains its UID (UID_pf). aay 
4 ( ft, 
x) .) 

b. Allocates a UID (UID_rf) for the reliable fiie, and makes nee 
ny an entry for it in 1ts UID table. antes’ 
Me WY 
’ c. Creates and initializes a descriptor for the reliable etnies. 
x file. The following descriptor fields are initialized. rats i 
x 
‘ . Sat 
itd The cardinality. at 
1 The primary site; ataatiie 
A The file version vector and descriptor version 
‘A vector, arte 
« ‘) Y 
‘4 The list of UIDs for images 1s initialized to SOY 
mK include UID_pf. Nee! 
it eaten 
“ d. Returns UID_rf to the client, indicating that the Create etal Ot 
so succeeded. 
‘ rn 
¥) Secondary images of the file are not created until the file is Cy Hy 
Kx written the first time (That is, after @ free write or after Y 
wt the file 1s opened. written into and closed). a ‘ 
4 U 
0 a 
When a reliable file 1s first written and whenever the file e 
} cardinality 1s increased. the RFM selects sites to store images ey, 
. ef the file. The acquisi -9on of new sites involves three steps. ms a 
¥ wn 
hy a. The selection of the new sites. ahd 
MY ise 
b. Obtaining commitments from the RFMs at the selected sites 
h to store images of the file. Sais 
yt VN, 
’ - 
‘ c Updating file descriptors at each of the file sites to iat 
h reflect the new sites. ; h 
a Ne ~\ u 
a, 5 
A 
b 
Ny -106- 
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The RFM acquisition procedure is structured so that an RFM Natta 
need not, as part of a single acquisition attempt, acquire every 
site required to support a file's cardinality. An RFM can ann 
Support operations on a reliable file even if not all of the KN Ms 
desired images of the file have been created. When an RFM is aif, 
unable to acquire all the sites necessary to achieve the desired Melty 
file cardinality, it creates a record in its PendingActions data mentality 
base to ensure that the additional sites wi}] be acquired. 


The acquisition procedure 1s described 1n more detail in the 
Cronus User's Manual 


8.3.5.2 Reading Reliable Files 


Reading a reliable file 1s similar to reading a primal file. 
File data mav be read by means of a free read operation. or by 
opening the file prior to performing read operations. In either 
case the interprocess communication facility delivers the 
operations to an RFM that manages the file 


There are several] differences in dealing with reliable files 
which are visible to a client. These include the following: 


a. The interaction between the RFM that receives the 
operation and the primary RFM for the file in order to 
achieve synchronization is not visible to the client. 
However, should the synchronization fai] because the 
primary RFM is inaccessible, the client will be informed 
and given an opportunity either to continue with the 
access or to abort it. 


b. A client process can obtain a list of the sites that have 
images of a reliable file, and }1t can choose which RFM to 
deal with to access the file. For example, it might 
choose the primary RFM. or, 1f an RFM happens to reside 
on the host 1t does, it might choose that one. 


c. After 1t opens a file, the client should continue to dea) 
with the same RFM for operations on the open file unti} 
1t closes the file. =e 
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2 8.3.5.3 Writing Reliable Files arcane 
: i 
Writing a reliable file 1s similar to writing a primal file. — 
The principal differences are essentially those noted above for iat aia's 
q reading reliable files: the required synchronization may fail due este att 
i to the inaccessiblity of the primary manager for the file, in satay 
: which case the client must decide whether to proceed at some risk mein 
3 or to abort the write, the client may choose the RFM with which ptaaatiiat: 
‘ it deals; and, after it has opened a reliable file for writing, a a 
client should deal with the same RFM for operations on the open aitettne? 
: file until it closes the file. ati 
4 Hata 
i File data must be updated after a free write or after a file pha 
H opened for writing has been closed {if writes have actually been rang 
‘’ made and are to be retained). : e 
- ‘ ANUpt 
K The RFM at which the writes are performed 1s responsible for i aly 
Y distributing updates to the other file images. It does this by ratgut 
; interacting with the other RFMs sites in the following way: ay 
: eas 
; a. It increments its (Site, Version) element of the file ' @ 


version vector. 


029,37 

SHAN) 
; b. It attempts to interact with each other RFM that manages ute, 
a an image of the file. Riu 
' isteetaynt 
c. Should it fail to complete the tmage update with any RFM, ; ry 


it adds a record to the PendingActions data base 


‘ specifying the file and the RFMs it was unable to update. 

‘ 

‘ 

al The actual update procedure for a particular image involves 

¥ several exchanges between the initiating RFM (1RFM) and the 

responding RFM (rRFM), and works roughly as follows: 

4 

; a. iRFM does InvokeQOnHost(SiteOf(rRFM), UID, 

i Updatelmage, DVV. FVV), 

% 

: where UID 1s the UID of the reliable file, DVV is the | 
version vector for the file descriptor, and FVV is the 
version vector for the file itself. 

" 

‘ b. rRFM compares both DVV and FVV against the descriptor and 

t file version vectors 1t maintains for UID. Assuming that 

{ 

t 
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DVV and FVV dominate the corresponding version vectors at 
rRFM. rRFM returns to iRFM a SendTheDescriptor message. 
(Section 8.3.6 discusses what happens if iRFM’s version 
vectors are dominated by or are incompatible with 
rRFM's.) 


c. When iRFM receives the SendTheDescriptor message, it 
sends the new value of the file descriptor to rRFM in a4 
HerelsTheDescriptor message. 


d rRFM receives the file descriptor and updates its copy of 


the descriptor It then returns 1RFM @ SendTheF) leUpdate 
message. 


e. When 1RFM receives the SendTheFileUpdate message, it 
transmits the file update to rRFM in a 
HerelIsTheFileUpdate message. Depending on the nature of 
the changes to be made to the file image, the update may 
be transmitted by sending the entire file or by sending 
only the changes that need to be made to the file to 
update it. 


f Finally. after it has stored the new file data in the 
primal file that holds its image of the file, rRFM 
returns an UpdateImageSucceeded message to iRFN. 


8.3.5 4 Other Operations 

This section describes the Change_Cardinality and 
Move_Image_To_Site operations. Both operations require 
synchronization with the primary manager. 


Change_Cardinality 1s used to change the number of images 
the system tries to maintain for a reliable file. An increase to 
the cardinality 2s accomplished by execution of the acquisition 
procedure described in Section 8.3.5.1. Decreasing the 
cardinality 1S roughly the inverse of increasing it. The 
performing manager selects a site or a set of sites which 
currently maintain images of the file and asks the manager at 
each to agree to discard its image of the file. and to remove the 
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file from its UID table. After each agrees, the performing 
manager instructs each to discard the image and the remaining 
managers to update their descriptors for the file. 


> 
Bi Sic 


Move_Image_To_Site moves a file image from one site to 
another, preserving the file cardinality. The parameters of the 
operation are the file UID, the site of the image to move, and a 
new site to hold the image. The operation involves creating an 
image of the file at the new site, discarding the image at the 
old site, and updating the descriptors held by all managers of 
the file to reflect the change. 


8.3.6 Use of Version Vectors 


Version vectors are used to detect inconsistent images of 
reliable files. In the current design. both the descriptor for a 
file and the file itself are protected by version vectors. 


Version vectors are compared in two situations. 


When an image of a file 1s updated. The RFM initiating 
the image update supplies its version vectors, and the 
responding RFM compares them with its own. 


When an attempt is made to lock a file for read or write 
access. The secondary RFM attempting to lock the file 
supplies the primary RFM with its version vectors and the 
primary RFM does the comparison. 


In each situation, both the descriptor version vector and 
the file data version vector are compared. There are four 
possible outcomes for the comparison of version vectors: 


a. The supplied version vector 1s the same as the loca) 
version vector. 


The supplied version vector dominates the local version 
vector 
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ec. The supplied version vector 1s dominated by the local 
version vector. 
d. The two version vectors are incompatible. 


The actions taken for these outcomes depend upon whether image 
updating or file locking is taking place. 


For updating. the version vectors are compared by the RFM 
whose image 1s about to be updated. The various comparison 
outcomes and the actions to be taken for each are. 

a. The supplied version vector 1s the same as the local 
version vector. Since the updating RFM increments its 
element of the version vector prior to sending it for 
comparison. 1f the RFMs are behaving properly, this case 
should not occur. If 1t does, some RFM has been 
misbehaving. The update should be deferred and the 
operations staff should be alerted by means of a message 
to the Monitoring and Control System. 


The supplied version vector dominates the local version 
vector. This 1s the normal case, since the local image 


1s being updated. In this case. the image update should 
proceed. 


The supplied version vector 1s dominated by the local 
version vector. In this case. the local image 1S more 
recent than the one that 1s to replace it. The update 
should be aborted, and the local version should be used 
to update the remote version. 


The version vectors are incompatible. This detects an 
inconsistency. The update should be deferred until human 
intervention can clear up the problem. 


In the 


locking situation, the version vectors are being 
compared by 


the primary RFM for the file in question. 
a. The supplied version vector 1s the same as the local 
version vector. This should be the normal case, and 
locking can proceed. 
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b. The supplied version vector dominates the local version 

4 vector. In this case, the primary image is obsolete, and 
should be brought up to date. If the file 1s being 

: Jocked for writing, the locking should proceed, and the 

: local image can be updated when the file 1s closed. If 

’ the file 1s being locked for reading. there are two 
possibilities Either, the primary file image could be 
updated before proceeding with the locking, or the 
locking could proceed and the f1le could be updated when 
the lock 1s cleared. 


c The supplied version vector 1s dominated by the local 
version vector. The secondary image should be updated 
before proceeding. If the file 1s being locked for 
reading. then the file tmage at the secondary site should 
be updated so that the client 1s given access to the most 
current file data. If the file 1s being locked for 
writing. then the secondary file image must be updated 
first to avoid incompatibility. 


d The version vectors are incompatible. lf the file is 
being locked for reading, the locking may proceed, but an 
attempt to signal a user or operator to resolve the 
incompatibility should be made. If the file 1s being 
locked for writing, the client should be informed of the 
incompatibility and given an opportunity to resolve it. 
The client may proceed without resolving the 


ee 
incompatibility, in which case the write 1s treated as an eyttrgttaget 
unsvynchronized write. ey 


xy nH ue 


8.4 Elementary File System 
8.4.1 Introduction 


The Elementary File System (EFS) 1s an easily ported single 
host file system that serves as & common base of implementation 
support for Cronus f1le managers on Cronus Generic Computing 
Elements (GCEs) configured with disks, on the UNIX system, and on 
the VAX. The underlying implementation of the EFS is consituent 
host dependent. but the interface presented to the Cronus File 
Manger 1s uniform As @ result. portability of the File Manager 
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1s enhanced, and the cost of integration of new hosts is reduced. iui 
The EFS was originally developed as a primitive file storage ate stat 
capability for the GCE mass storage devices. PORTER 
Nye fc 

The two principal design objectives of the EFS are: Ns i 

’ Se ay 
1. Sufficient functional capability to support the Cronus Yy Bek 
distributed file system. 2 — 

“ye a} 

2. Simplicity and efficiency. 

a 

The principal] users of the EFS will) be object managers. mnie 
Client processes will seldom. if ever. directly access files weit 
through the EFS. Therefore, only the most basic file vse) 
operations need be supported. More complex file functions Pet 

; can be supported by the object managers themselves. Simple Renan 
steps have been taken in the internal organization of the yS Li 
EFS to support effective crash recovery and system restart RO 

: TRI 
procedures. ieraytat 
The Elementary File System will have the following woe 
characteristics: cy nate 
afotlet 

eit at 

1. The name space for EFS files 1s flat. Names for EFS files tat! 
are called FileIDs. and thev are fixed length bit strings. ty 
FileIDs are not Cronus UIDs. A FileID 1s unique on the EFS : i 
that generated it. but it 18 not unique across al] Cronus eae 
hosts. The EFS is a Cronus object 1n much the same way that Aiea 

the existing UNIX or VMS file systems are Cronus objects, ea 
but seit 
aa 

2. A EFS file is not a Cronus object. : és 
3. File data 1s organized as &@ sequence of fixed length blocks. eS v 
: File 1f/o 1s sequential. and is block oriented. The basic “aM wn 
' file 1/o operations are. oa 


ReadEFSFileBlock(FileID, BlockNumber, Buffer), and 
WriteEFSFileBlock(FileID, BlockNumber. Buffer). 


he FF 
os eee 


f 4. There are no open or close operations. No setup is 
necessary to read data from or write data to an existing EFS 
file. 
-113- atta! 
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It is necessary to create a EFS file before writing data to 
it. This 1s accomplished by the 


CreateEFSFile() 


operation, which creates an empty EFS file and returns its 
FilelD. 


EFS files are deleted bv the 
DeleteEFSFile(FileID) 
operation 


There 18S no access control for EFS files Possession of the 
FileID for a EFS file 1s sufficient to access the file. 


The EFS will normally be accessible only to Cronus Services. 
The primal file manager is an example of such a service. These 
services provide controlled access to the objects and operations 
that they implement. as described in Section 8. 


In addition to supporting the local primal file manager, the 
EFS may be operated on as an object to permit remote access for 
méintenance and debugging purposes. There 1s a single access 
contro] list (ACL) associated with access to the entire EFS 
through the EFS_File Manager. Only a@ very few principals will be 
on the ACL for a EFS. An example of a principal which might be 
granted access to the EFS 1s the “System Maintenance” principal. 


8.4 2 File Formats 

The following description of the Elementary File System 
structure assumes that a disk can be represented by a series of 
fixed length blocks. In the Cronus ADM, the storage may be. 

a disk drive on 4 GCE, 


& disk device in & UNIX system. or 
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a contiguous file on the VAX/VMS. 


The EFS makes few demands on the underlying recording medium, and 
it 1s relatively easy to see that most potential Consituent 
Operating Systems will provide a construct upon which the EFS can 
be built. 


File disk blocks are self-identifying for reliability 
purposes. Each block includes a header that contains the FileID 
and the block number. The file header in each block contains a 
NextBlock pointer which 1s the disk address of the next block. if 
anv. in the file. The NextBlock pointer in the last block 
contains a specia] value marking the end of file 


There is & FilelID Table which provides a& mapping between 
FilelDs and the disk address of block O of the file (see Figure 
1). The FileID Table 1s as a file with a well-known FileID 
(FileID = 1) Its block O will be stored at a known disk address 
(with an alternate copv stored at another location to prevent 
loss of data in case the primary block 1s bad) The FileID Table 
is & hash table. 


There 18S 4 «reeDisl:Block table which records the disk blocks 
that are available. The FreeDiskBlock table 1s a bit table 
stored in @ file with a well-known FileID (FileID = 2). Its 
block O is stored at & known disk address. When a file is 
deleted, its blocks are recorded in the FreeDiskBlock table, and 
the FileID field in the headers of each of the blocks 1s cleared. 
As disk blocks are needed they are allocated using the 
FreeDiskBlock table. 


There are two types of EFS files. The type of the file is 
contained in the header of block 0. T types of EFS files are 
(see Figure 2). 

& Short file 
This 1s a file, all of whose data will fit within block O. 


b. Normal file. 


This 1s @ file whose data will not fit within a single 
block. 
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A Normal file may contain index blocks which allow random access 
to the file. By convention, the first of these blocks is given 
block number -1, and contains: 


A block index which holds the disk address of blocks 1 
through N of the file; and 

The disk addresses for two overflow blocks, named 
OverflowBlock1 and OverflowBlock2, which can be used to find 
the block index entries for blocks numbered greater than N. 


If the file 1s very large, not all of its index will fit into 
block -1. 


OverflowBlockl 1s used as an index for blocks which store 
part of the block index which will] not fit in block -1. 
Specifically, if block -1 can store indices for blocks 1 through 
N, if OverflowBlockl] can store M disk addresses as indices, and 
1f each block it indexes can store P disk addresses, 
OverflowBlockl can provide access to indices for M*P additional 
blocks, numbered (N+1) through (N+M*P). The block index for the 
Normal f1:le shown in Figure 2 overflows bjock -1 into 
OverflowBlockl, and 1s smal] enough that it doesn't require 
Overf lowBlocke 
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OverflowBlock2 provides an additional] level of indirection 
for very large files. It contains an index for blocks which are 
used in the same manner OverflowBlock!] is. If OverflowBlock2 can 
hold Q disk addresses as indices. then it can provide access to 
indices for M*P*Q blocks, numbered (N+M*P+1) through 
(N+M*P+1+M*P*Q). 


Py 
Pe 


By convention the BlockNumber for OverfiowBlockl 1s -2. Any 
index blocks referenced by OverflowBlockl1. as well as 
OverflowBlock2 (if present), and any index b!;cks 1t references 
directly or indirectly sre assigned BlockNumbers in a negative 
sequential fashion starting at -3 in the obvious manner. 


Some constituent hosts will] have multiple disks (in the case 
of UNIX, these may actually be disjoint regions on @ single 
physical disk, and in the case of VMS. they would be multiple 
contiguous files). Part of the FileID specifies the disk on 
which the file resides. The CreateEFSFile operation takes an 
optional parameter which specifies a disk. If the parameter is 
supplied. block O and e)]1] subsequently created blocks of the file 
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are allocated on the specified disk. If the parameter is not ate i 
supplied, block 0 and subsequent blocks are allocated on the disk ra 
the EFS sees fit. ate 


mia 


8.4.3 Disk Salvaging a 
ey 
There is a BadDiskBlock table which holds the disk addresses Sp 
of bad disk blocks. The BadDiskBlock table is stored in a file Ne Ky 
with ea well-known FileID (FileID = 3). ia 
AON 

There 1s a EFS disk salvage operation which can reconstruct i 

the FileID table. the FreeDiskBlock file, and the BadDiskBlock a 
file. and reset the NextBlock pointers in files. om 
aa 
The salvager may encounter files with missing blocks. When eet 
it does. 1t will f111 1n any hole it encounters with a newly sal e 
allocated filler block. linking the filler block into the file RS 
where the hole was. The FilelD of the filler block will be set aan, 
to the ID of the file. and its BlockNumber will be set to a andy 
special BlockNumber which identifies it as a filler block. The ates ih 
only data in & filler block will be the BlockNumbers of the aias 


previous and next file blocks which contain data. Higher level 
software can be used to recover the data in a file which contains 


Tait 
filler blocks. meat 
As the salvage procedure encounters bad disk blocks, it a 
records them in the BadDiskBlock file. If it encounters a bad Ra 
block which is part of a file, the salvager will] remove the block 
from the file and substitute a newly allocated replacement block eats" 978 6 
by linking it with the other blocks of the file in place of the x OO 
bad block. The FileID of the replacement block will be set to ha 
the ID of the file, and its BlockNumber wi1] be set to a special SS 
BlockNumber which identifies 1t 18 a replacement block. The only ate 
data in the replacement block wi]] be the BlockNumber of the 
block 1t replaces. This will make 1t possible for higher level 


software to recover the data in other blocks of the file. 
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9 Symbolic Naming 
9.1 The Cronus Symbolic Name Space 


Cronus has a global symbolic name space with the following 
properties: 


1. Cronus symbolic names are location independent. 
a. A name for an object is independent of its host. 


b. A name that refers to an object can be used 
regardless of the location from which it is used. 


Cronus symbolic names are uniform. 
melacere, 
Common syntactic conventions apply to names for different i 


types of objects. 


a ek 


The symbolic name space is constructed upon a hierarchically 
structured tree. The tree contains nodes and directed labeled 
arcs. There 1S a distinguished node called the “root” Each 
node has exactly one arc pointing to it, and can be reached by 
traversing exactly one path of arcs from the root node. Nodes in 
the tree represent Cronus objects which have symbolic names. 
Links provide an overlaid structure based on symbolic pointers 
which provide a mame space which is a network, so a node may be 
reached by more than one path. 


r 
Ny 
A 
¢ 
¢, 
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Non-terminal nodes (those from which arcs may originate) are 
called directories. Each labeled arc corresponds to a catalog 
entry. The label for an arc is called an “entry name”. 


The complete name of a node. which 1s the symbolic name for 
the object, 1s formed by concatenating the labels on the arcs 
traversed on the path from the root node to the node in question, 
separated with the character ":". In other words, the syntax for 
a complete name is. 


oa 


1 eps cy 


Co 
a 


are 


where "x" and “vy" are arc labels, the "$","{" brackets indicate 
optional presence, the "." 1S & punctuation mark to separate name 
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components, and "} s {*" means zero or more occurences of s. 


It is also possible to name nodes relative to ao directory. 
Such a relative name is formed by concatenating the labels on the 


arcs traversed on the path from the directory in question to the 
node. The syntax for a relative name is: 
ee a TOF 


There are conventional names for the current (“connected” or 


“working”) directory. 1ts parent, and the user's initial 
directory. 


The most common types of cataloged objects are the various 
kinds of files. but any other object may be cataloged. Some 
conventions will be adopted; for example, there will be a :dev 
directory which contains the symbolic names for the devices on 
the system. These conventions are not enforced by the system, 
and any object may be entered into any directory (assuming 
appropriate authorizations) at the convenience of the user. 


There are certain special] object types which are used in 
support of the catalog itself, including: 


° Directories 


A directory object (type CT_Directory) is a non-terminal 
node in the catalog tree. 


° Links 


The catalog entry for a link (type CT_Symbolic_Link) 
identifies another point in the symbolic name space called 


the link target. These objects are stored in the catalog 
itself. Links are cataloged as termine) nodes in the name 
hierarchy tree. Links are handled specially within the 


Lookup operation. 


° External] linkages 


An external linkage (type CT_External_Linkage) is an object 
which implements access to another name space. External 
linkages are cataloged as terminal nodes in the name 
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hierarchy tree. External linkages permit users to refer to mite 
non-Cronus objects directly from the Cronus name space. For ay 
example, an external linkage might be used to give a file 
directory on a Cronus application host a Cronus symbolic ats 
name. etsy. 
tt 
fae 

For some object types it is useful to be able to think of a xh A 
collection of the objects as a sequence of “versions” or eo 
“revisions” of the same logical object. The Cronus Catalog ate 
implements a version feature for certain types of objects; for mat. 
example. versioning wil] be supported for files, but it will] not Ralciaetet 
be supported for directories. mand 
valle! 

For types for which versioning is supported. the catalog e 
entry operation will] permit the same name to be entered into a ORY at 
directory more than once. Each copy of the entry will have a | muna 
distinct version field and should point to a different object. | att 
However, all objects pointed to by different versions of the same | ee 
entry name must be of the same type. The first time a name 1s ids 

entered. the result will] be version 1 of the object. Subsequent 

entries of the same entry name wi]!} result 1n successively higher oe 
versions of the object. All] of the catalog operations which take ne ais 
a@ name parameter will allow the specification of a version number mie sist 
as well. 2 
BN a Pre 
The catalog managers provide routines that can scan through . @. 

the catalog and return catalog entries for names that match a ea 
specified pattern. ci 
4 


The create(entry) operation can be used to simply establish | 
@ symbolic name for a Cronus object of any type except a | 
directory, symbolic link, or external linkage object. These | 
types of entries are inserted in the catalog when they are 
created (since other objects need not be named, the creation of 
the object and naming of the object are distinct operations). In 
@ sense. these objects are special in that they must have a 
symbolic name in addition to a UID. 


Figure 1 shows a relatively simple symbolic nare tree and 
Figure 2 shows part of the underlying directory structure that 


corresponds to the part of the tree that contains the name 
.4.bi.c. 


-1l2e- 





ola 
ae 


i re toh Oa SO aan yo ata its Cote WOES watt 
a ses a mien Sea wrens BARR pay petantee ves sts Ae SANS me 
ait 


* ce 
evant eata,t's RRS wee as Monies eth ft My xt ” abet SS ie Heat they es 


fe cil “3. ae Ce UP ae er ar a ae er Ped ere Le ee i ee Le es es ‘ UN ee +e8 ° * 


wet 


She 


———_._ Rost directory Ree wis 
oe 


/ 
ees 


NY . 


a -~ e 
7 wie 


\\ a 


é b / | i : bd 
‘ 

O ‘ \ i i \ ahs 
\ a ee 

\ é i | * at ‘" 

Jb dhe 
ROOD 


a! 
“ 
4 


a 


ee 


ne on on 


~~ 


| Suk 
Catalog Hierarchy 


Figure 9.1 nn 
: nH 
ate, ttg tt 


: ¢ 
-123- Ry 


) 
ws ut ” My at? Ra mane ny ans mt Paneer 4 Sein me at a aie! s”, nt hake) Sa mt oS ni ae Xe Bae Ree oh me we " a 


Pa Bers ie, lh 


Filo that 
impl ements 
roct directory 


Cat vlog 
entry 


RS aR 


rectD: FUID 
rocthCL 
rootFilevlD 


: reeled 
\ [AGL 


NARMS 
Sa 


att Ly aa, Ny 


Implementation of Cronus Catalog 


File thot 
Laplemonts 
threctory >a 


bb 
s02b" 
ebUirulD 


bf UID 
abALL 
abfileUIO 


obf leUI0 
ALL. 


File that 
teplemonts 
derectary >07b 


A Matis PLP PM LSAT ee Pe MU 


Filo that 
Lamer.ie 
ti # s07b>¢ 


abele Lewy 
fIGL 


Implementation of Cronus Catalog 


Figure 9.2 


ant Susans 


Sh at 


Na" 0a pe ae eee 
oe ayes eae nga 
Sinaia #.' » a venetian nena aay a) 


Nata atecetas 
) a¥ "" 
we tty 

q 
( 


a 


Directory 
UID Table 
(Burcyed by 


stale 
Hunager) 


File 


UID Table 


(Manajed by 
Pile Marnajers) 


ast 
pattes Se ae 


sheaelt vt, 





tue ian i 


me 
Hat safe at otk Ly oh 





) 

‘ 
When a lookup operation is invoked, the catalog manager Myatt 
interprets a complete Cronus symbolic name by starting at the Peasttacetitits 
root directory. The UID of the root directory is well-known. SIS 
The catalog manager processes a name component by searching the we et 
current directory for a matching catalog entry. If it finds a N my 


Sse 
matching entry and there are no more name components, the lookup anit 
is complete and it returns the catalog entry. If it finds a » Ran 
matching entry and there are more name components to interpret, 
the entry must be for a directory, symbolic link, or external 


linkage, or else the lookup ends in failure. If the entry is a Ree 
directory, the catalog manager continues the lookup by obtaining RA 
the UID for the directory from the entry and then using it to naan 
interpret the next component. Interpretation of a relative eat! 
symbolic name 1s handled in the same fashion, differing only in tata ts 
where the lookup starts. For a relative name, the catalog oe. 
manager starts its search at the starting directory parameter of Ma eG 
the lookup operation. RH 
Bn ens 

amt 

Symbolic jinks encountered during lookup are handjed in a vneute 
special meun >. When a link 1s encountered. a new name 1s formed PONINA Ma 
by substituting the link target, which 1s a complete Cronus 7 ee 
symbolic name held in the catalog entry, for the portion of the PO 
symbolic name evaluated so far. The lookup operation then au! 
resumes by interpreting this new name. Links can be thought of etait 
as macros which are expanded during the lookup operation. aha 
209%,8. 


A parameter of the lookup operation controls whether links 
are to be expanded. If the parameter specifies that links are to 
be expanded, the substitution of link targets during the lookup 
operation occurs. If the parameter is set to prevent links from 
being expanded, the lookup operation terminates when a link 1s 
encountered. In this case, the lookup operation wil] be 
considered successful if the name has been completely evaluated. 
Otherwise 1t will be considered a failure. 


9.2 Qbjects Related to the Catalog 
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9.2.1 Objects of Type CT_Catalog_Entry oS a 


Each catalog entry 1s a Cronus object; however, unlike most | Meta: 
t 
objects in Cronus, a catalog entry has no UID. A catalog entry | ea 
contains the following information: | oe 
; UID for the object, oH 
Complete symbolic name for the object, 
UID for creator of entry (PrincipalUID); and | aes 
Type-dependent information | pe 
4 et 4 * 
Tvype-dependent information for objects of type CT_Directory, | ra 
CT_Svmbolic_Link, and CT_Externa]_Linkage 1s discussed below. | int! oy 
For objects that are not part of the Cronus catalog, everything | a 
that can be known about an object 1S maintained by (or can be | ae 
: obtained from) the manager for the object. That 1s, no type- | xt 
p dependent information 1s maintained in the catalog. mt 
nt 
SR 


9.2.2 Objects of Type CT_Directory 


For directories. no type-dependent information, except the 
host that stores the directory, would be maintained in the 
catalog entry. Al) other information about the directory will be 
maintained with the directory object itself. 


9.2.3 Objects of Type CT_Symbolic_Link 


For a symbolic link. the type-dependent information, which 
completely specifies the link, consists of the complete symbolic 
name for the link target. 


UID. 

Complete symbolic name for the link. 

UID for creator of entry (PrincipalUID), and 
Complete symbolic name for the link target. 
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9.2.4 Objects of Type CT_External_Linkage 


For an external linkage, the type-dependent information om 
completely specifies the external linkage. It includes a Cronus ro 
interpretable designator for locating the other name space and a N 
symbolic name that is interpretable in that other name space. 

The details of the method for designating other name spaces and 
for interacting with them are incomplete. A catalog entry for an 
external linkage will include. 


UID. 

Complete (Cronus) symbolic name for the external 
linkage; 

UID for creator of entry (PrincipalUID) 

Cronus interpretable designator for the other 
name space, and 

Symbolic name interpretable in the other 
name space. 


9.3 Catalog Operations 
9.3.1 Objects of Type CT_Catalog_ Entry 


The following operations are defined for the Cronus symbolic | 
catalog (see Cronus User's Manual cat_entry(3)): } 


Create 

Remove 

Lookup 

Read 

Change 
InitScan 
ScanDirectory 
LookupWild 


LookupWild performs a catalog lookup using Cronus wild card 
conventions (see Cronus User's Manual sym_name(4)), and returns a 
list of al] the entries which match the specification. IJInjatScan 
and ScanDirectory perform the same function. but incrementally, 
returning individual entries. 
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9.3.2 Objects of Type CT_Directory adda tal, 
eae et edt 
sat aitige 

The following special operations are defined for objects of : 
type CT_Directory (see Cronus User's Manual directory(3)): Rena 
nie 

Create RO 

wie 


Remove 6h! 
2 


4, 
a; 

DG 
a 

e 


9.3.3 Objects of Type CT_Svmbolic_Link 


The following special operation 1s defined for objects of 
type CT_Svmbolic_Link (see Cronus User's Manual sym_link(3)): 


Create 


9.3.4 Objects of Type CT_External_Linkage 


The following special operation 1s defined for objects of 
type CT_External_Linkage (see Cronus User's Manual ext_link(3)): 


Create 


9.3.5 Access Control for Catalog Operations 


All of the catalog operations are operations on one or more 
directories. There are three rights defined for access control 
purposes: 


ReadDirectory, 
WriteDirectory. and 
Modi fyACL. 


& 
ReadDirectory rights are needed for all operations which atts: 
return tnformation from a directory. In operations which access 
multiple directories, such as Lookup. ReadDirectory rights are 
needed for each directory accessed. WriteDirectorv rights are 
needed for all operations which insert or remove entries from a 
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9. 


directory, or alter the contents of an entry (with the exception 


of those which change the access rights). ModifyACL rights are 


needed 


in order to change the access rights to an object 


represented by a catalog entry. 


Table 9.1 summarizes the access rights required for the 


Create(entry) 

Create( link) 
Create(external linkage) 
Remove(entry) 

Lookup 

LookupWi ld 

InitScan 

ScanDirectory 
Read(entrv)} 
Change(entry) 
Createidirectory) x 
Remove(directory) x 


Table 9.1 Access Rights Required for Catalog Operations 


4.1 


The following implementation issues are discussed below: 


1. the manner in which client processes interact with the 
catalog manager which implement the catalog functions. 
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4 Catalog Implementation 


the use of Cronus data storage resources to implement the 
catalog data base, 
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rv does not contain the subtree required to interpret the entire 
symbolic name. 
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ne 9.4.2 Cronus Catalog Managers | WEA 
what tn 
There 1s a catalog manager process at each host that | 
AY maintains part of the catalog. It 1s the object manager for 
a objects of types CT_Cronus_Catalog. CT_Catalog_ Entry. 
iv CT_Directory. CT_Svmbolic_Link. and CT_Externa!_Linkage. | 
¥ 
af, 
"ie The catalog managers communicate with client processes by | 
) means of the standard Cronus IPC facility. Since the catalog | 
* hierarchy 1s distributed among Cronus hosts, different managers | 
Dy will have direct access to different parts of the catalog. Some | 
itt catalog operstions can be accomplished by a single catalog | 
ity Manager and some require the cooperation of two or more catalog | 
Me managers | 
a For example, the Remove(DirUID. catEntUID) operation would ( 
Ag! normally be sent to the manager for directory DirUID. and only | 
te that manager is required. The lookup operation may require 
EY catalog managers on two hosts if the manager to which it is sent | 
fs | 
| 


Ny 
Mt A client process will not, in general. know which catalog | 
mM manager 1s the best one to perform a given operation For this | 
ny reason, a client can initiate a catalog operation with any | 
iy catalog manager. lf the manager selected can perform the | 
operation requested by itself. 1t will. If not, 1t will interact | 
en with other managers as necessary to perform the operation. 
OW 
N} 
4 
‘ A 
9.4 3 Implementation of the Catalog Hierarchy 
¢ 
ry) 
ey é Directories are stored in files. The catalog manager 
’ maintains a UID table for the objects 1t manages. Since the 
y principal objects implemented bv the catalog manager are 
“ directories. this table 1s called the Directory UID Table. The 
se) 
4 
how 
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Directory UID Table maps the UIDs for directories into their 
object descriptors. 


A directory contains zero or more catalog entries. The 
catalog entry for a (inferior) directory contains the UID of that 
directory. To access a directory given its UID, the catalog 
manager uses the Directory UID Table to obtain the object 


descriptor for the directory, and then uses the file UID in the 
descriptor to access the file that holds the directory. 


9.4.4 Distribution of the Catalog 
9.4.4.1 Principles Affecting Distribution 
Among the considerations influencing catalog distribution 
are. 
1. The catalog should not be stored at only one site. 
This 1s a reliability consideration. 


The catalog should be distributed, and it should probably 
be replicated in some fashion. 


az 


The entire catalog should not be stored at any single 
site. 


This 1s a scalability consideration. 


3. It should be possible to access an object when the site 
that stores the object 18 accessible. 


This 1s a reliability consideration. 


Access to objects through the UID name space has this 
property since the information required to access an 
object, given its UID, 13s maintained by object managers. 
Access to objects through the symbolic name space should 
also exhibit it. 
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The catalog entry for an object (or a copy of the entry) tte 
should be stored at the same site as the object. In alts salits, 
addition, there should be enough information at the 
object site to control access to the object. eae 
Staite 
4. There is little utility in maintaining a catalog entry AA, 
for an object in amore reliable fashion than the object re et 
itself. Hap een, 
e 
a: ry. 
This 18 @ common sense consideration. ed 
ean 
It 1s not necessary to replicate catalog entries for tea teat 
objects beyond that required by (3). aeataiettt 
Hatgtetis! 
in awh ath. 
....% 
There are some further isssues to consider associated with | nes 
(2) and (4). and we disscuss them 1n more detail in the next two | LILY 
subsecticns The discuss.on includes elements of the | A 
implementation oi tice reliable system as wel] as the primal | ¥ by 
system. because these may impose constraints on the primal system | went 
design aw. 
Ren 
ett, 
a 
he mi 
9.4.4.2 Dispersal Of The Catalog rine 
This section examines the requirement that the catalog not iis 
be stored at & single site The line of reasoning followed 1s eon 
essentially that that lead to the design of the Elan hierarchy sit 
[BBN 3796] ean My 
Bs! a 
Directories are the basic unit of distribution for the e 
Cronus catalog. Directories are implemented by Cronus primal and | AAS 
reliable files. The lookup operation follows the components of a pare 
symbolic name through a number of different directories, one for past 2, 
each component in the name (essuming it does not encounter a wet “4 
symbolic link). Unless there is a further restriction on the a TO 


dispersal of the catalog. each directory could be at a different 
site from the previous one. 


It 1s desirable to limit the mumber of sites that must be 


















visited in a lookup operation. Two useful restrictions are to. 
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1. Require that the catalog structure for entire subtrees a 
below a certain cut (the “dispersal cut”) through the oD 
catalog tree be stored within a single site. We call a < 
subtree that is rooted at the dispersa) cut a ac of 
“dispersal subtree”. mae 

oe may 

2. Require that the catalog structure above the dispersal] oot oe 
cut be stored within a single site. We cal) the * matt 
structure above the dispersal cut the “root portion” of 
the hierarchy. 





Restriction 1 ensures that lookup operations within a 
subtree that is below the dispersal cut can be confined to a 
single site. Restriction 2 ensures that the task of determining 
the site that stores a particular dispersal subtree can be 
confined to the site that stores the root portion of the 
hierarchy. As @ result, lookup operations require at most two 
catalog sites. 


It 1s useful to add a third property to the dispersal of the 


sears hy vt: 
catalog. cat eet, 
: a i 
3. The root portion of the catalog hierarchy should be es 
replicated. Furthermore, a good way to replicate it is 
to maintain it at each site that maintains a part of 
the catalog (i.e. a dispersal subtree). The reasons 


for doing this are. 


o To distribute the load resulting from lookup 
operations among several sites. 


o To allow some lookup operations to be confined to 
@ single site. 


o To increase the availability of the root portion 
of the hierarchy. 


Figure 3 illustrates how a simple name hierarchy might be 
dispersed among several hosts according to these three 
restrictions. 
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For this to be practical, it must be possible to maintain ta! 


the copies of the root portion in a consistent fashion among the gut 

’ same set of hosts that store parts of the catalog. It has been 
observed that the root changes very slowly, because few users are as 
authorized to make changes, and because changes generally occur he a 
; as the result of the addition or deletion of a user or project. Baas 
: This means that the maintenance mechanism need not be powerful pret 
' enough to handle the general multiple copy update problem. | DON 


9.4.4.3 Replication of Catalog Information 


The primary consideration for replicating catalog 
information 1s one of reliability. The objective 1s to ensure 
that Cronus objects with symbolic names are accessible 
symbolically whenever the sites that manage the objects are. It 
1s likely that unavailability of a catalog manager will be the 
; result of a host crash, so that we can assure maximum access by 

providing a copy of 4a catalog entry on the host where the object 
‘ 1s cataloged. Then the entry will] usually.be available whenever 
: the object 1s. If this 1s the same as the site of the primary 
4 catalog entry, then no replication is needed. If it is 
2 different, then a secondary catalog entry is provided on the host 
: where the object resides. 


~ae ese 


~. 


For every host on which there are object managers, there 

’ will be either a full catalog manager or a secondary catalog 
manager. Each full catalog manager will maintain a fully 
replicated root part of the catalog tree and its own subtrees 

! rooted at the dispersal! cut. In addition, both full and 

: secondary catalog managers will maintain a separate database, the 
secondary entry table, which contains secondary catalog entries 
for objects which are on its host but for which the catalog 

, subtree containing 1t 1s not local. 































; A secondary entry is a catalog entry which stands in for the | 
primary entry for an object. It differs from the primary entry | 
F in two respects. First, 1t can reside only on the host on which | 
‘ = the object resides, and then only if the primary entry 1s ona | 
; different host. Second, it 1s stored in the secondary entry | 
‘ table, not ina directory. | 
’ 
t 
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The secondary entry table is not used to speed up local | 
access to the object UID. Rather, 1t 1s available only to | 
support catalog operations when the primary entry is not | 
accessible. The reason for restricting its use 1s to avoid | 
| synchronization problems between the primary and secondary 
entries for an object during normal operation of the system when | 
no hosts are down. If the object has more than one symbolic | 

mame, a copy of each catalog entry will be stored at the object's | 
host. That is. there will be a collection of Cronus catalog 
entries at each host for those objects that have symbolic names 
that require access to directories on othe: hosts. The catalog 
manager software will maintain the consistency between these 
secondary catalog entries and the primary entry. 


Figure 9.4 illustrates how the catalog information will be | 
maintained. The circular nodes represent objects that are stored 
at the same host as their entry in the catalog hierarchy and the 
square nodes are used to represent catalog entries for objects 
that are stored remotely from their entries. 


Under normal conditions. the lookup operation uses the 
primary entries in the symbolic catalog. When not ail of the 
directories are accessible, the secondary symbolic access path 1s 
used. The lookup will succeed whenever the object itself can be 
-eached, since 1f the object has a symbolic mame. a copy of the 
catalog entry object will be stored at the site that manages the 
object. (14) 


When a client process first invokes a Lookup operation. the 
operation 1s performed using oniv the primary catalog entries. 
If that fails, the client may then attempt to perform a look up 
on the full symbolic name of the entry of interest. In this 
second lookup attempt. the client must multicast the lookup 
request to all catalog managers and set the key in the request 


(14) Lookups of partial symbolic names cannot be performed using 
the secondary access path. because the failure of the initial 
lookup suggests that the catalog manager which can interpret’ the 
Directory UID for the start of the search 1s unavailable. To use 
the secondary access path, the client must remember the _ full 


symbolic name for entry. Further, the secondary access path wil} 
not have the mechanism of svmbolic links avallable. As a result, 
a path name utilizing such a tink will also fail. 
-136- 
rete, ~ 
ot 
- CL fn ao, pk ella pl teh pta SAT AT tea aS 
mn SN nt ao niet haa se Ra inna ee AR a als aon apenas x aie Piatt Na NS Repeat orm 
as ot Ky A aren uate a! Nie “ Renita nett ve mune Pitty A ettt fa es att atest aus ie a yn Cn ne WY %, ae 


PRT eaT Sat UT Pd a er Cer Ee ee a ee Pe ae ere Ue oe ee 


PRIMARY ACCESS PATH 


Dispersal Cut 


Repliccted Root Portion 
of Nome Hiererchy 





" 
\ 


oma 


I 
Cataieg ' d 


entry : ‘ ' 
i [| copies F }eopies . 
4 / y SS y 
/ tet ee 
Fr 
/ : / 
Host C : Host D j Host F _ Host G 


ae / 
SECONDARY ACCESS PATH 


‘ 


‘Catalog | | 

y lentry yr | 
jcopies - 
= 


Catalog | 
to ofentry 
(| cep.es { 


7 


38 | _ sof: 


Catciog ' 


entr 


| 
| 








Secondary Symbolic Access Path 
Figure 9.4 


-137- 


. ound CC GAGES 
Q a ") Ot) wey 4 1's Hy y i 4a" i #) a at () 
BUR EN 


uty § patiity sata netavele atta 


Ae end 
nae 


Re f eee Pe ata atsas' 4 

atte ateeiatatg stant esnatstetenetgratyratynvtestte atte sr eittay 
shal gta hatter fat ghs RAti Hee nityeatatetateate aM tadgtale aly 
RoR ROR ATA SR 


SRR ee a ceria Roe 
y 
nn pn 


4 
wistateatent atta 


0,8, #9 


e 
(A) 
lata 
OY 
48 tat! 

é 


tllatatal 
bans 


DK MY 
\ 


SON 
wht! 
a ne 
Natty 


$ 
we wi 


Ch) ¢,) ] 
atuetyatant 


ra 


Ne) afc 
tee 
SINK 


OR 
dati 
Natt, 


: DR NH 

tae Cee, 
nate 
t. 


pe ee a AS PT Pe UES OP RO RAS ASL UL LL db Re OP WU A 
ue stnigtt 


Mail " 


A 


indicating a secondary access path lookup. 


Lookup by means of the primary path 1s much more efficient 
since 1t 1s directed, whereas lookup by means of the secondary estat 
path 1s undirected. There is no @ priori knowledge of the host itt 
or hosts that need to be consulted to perform a lookup by the xe 
secondary path. Het 


9.4.4.4 Synchronization Among Catalog Managers | iin 


There are two cases in which catalog managers must 
synchronize among themselves in order to preserve consistent 
information: the replication of the catalog hierarchy above the 
dispersal cut. and the correspondence between the primary and 
secondary entries for objects which reside on one host and have 
their primary catalog entry on another. 


" 
ER 
In addition. there are two aspects of the synchronization | mtenaiyate 
problem. the first 1s the synchronization among hosts which are | e 
all running, the second between & host which has changed the | yeres tet 
catalog and @ host which is reintegrating into the Cronus cluster | mtteatn! 
after a period of inaccessibility. feet 
Rita neta 
This section discusses techniques for automating replication wey 
of the root portion of the Cronus hierarchy (i.e, that @ 
above the dispersal cut). While the approach discussed applies TREN 
to the Cronus catalog. 1t 1S also intended to be used as a mia 
bth 
base for more general replication services that might be applied SUR RMK 
to other Cronus components (the authentication manager, file SAN 
managers, etc.) COOCAIST 
e 
As with al) Cronus functions, automation of catalog patatcatenat 
replication will be implemented by the object managers. aR) 
Initially, we can think of the functions needed to implement this aatetianys 
automation as being composed of the basic operations on an rabartt 
object type. Later it might be appropriate to cast them iets 
as new operations. In any case, we will refer to these @ 
functions as operations irrespective of their actual CEO 
implementation @s operations on & type. In the case of the sf ae 
catalog. these replication functions will be handled by the A 
Catalog Managers. rather than in 4 more general way such ‘> oN 
as through some form of replicated file Eventually. when we ata tate 
® 
en 
Rs 
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gain more experience with replication, we may want to provide a RAK 
more generic wav of providing replication services. ON 
2 eh Al 
We define the following basic operations: aah a 
RE 
H 
fe) Replicate existing directory a 
o Dereplicate existing directory an 
° Modify existing directory (add, delete, modify/entry) AN 
° Reintegrate host a 
i ye AD 
Wea) 
vat 
In addition to these operations. we can add two more functions oe Mh 
related to management of the replicated portion of the catalog: oe 
15068 
o Move dispersal cut (or replicate/dereplicate above/below 
directory) at 
oO Copy dispersal (make @ copy of the entire dispersal hierarchy) ae 
e' 
In order to simplify the design. we will restrict ourselves to ants 
these functions Other variants. such as create a new Meant et 
replicated directory, can be implemented from these and the SUR on 
ex1Sting catalog operations in the obvious manner. nin 
ant nie 
ett 
Our approach to maintaining consistency in the Sr 
replicated portion of the hierarchy will be to use update ERRREKY 
logs that are maintained and accessed by the Catalog Managers. WANT 


We will discuss the management of updates in more detail 
later, when we discuss reintegration. 


Before discussing the operations, recall that al} 
directories in the hierarchy above the dispersal cut are 
replicated on atl hosts. Below the dispersal cut, each 
subtree of the catalog hierarchy is maintained on a single 
host. This ensures high availability of the root portion of the 
catalog and a minimal = number of inter-host accesses in a 
directory search. The catalog is designed to accommodate 


infrequent changes to the root portion of the hierarchy, so speed 
of update is not @ major issue. 


In Figure 5 we see a detailed representation of the 
replication of the root portion of the catalog hierarchy on _ two 





hosts, A and B. Note that the directories above the 

dispersal cut are trulv replicated, having the same directory 

UIDs. The reader should not confuse the implementation of 
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the catalog (as files with different file UIDs maintained by 
the file managers) from the replication of the catalog itself 
(directories with the same UIDs maintained by catalog 
managers). The reader should also remember that the contents 
of the replicated directories are also replicated (e.g. they have 
the same entries), and that they have location independent 


semantics. That is, the entries consist of a symbolic name that 
is known globally (through the catalog) and a UID that is 
known globally (through the operation switch). With this 


background. we can now go on to discuss the operations in more 
detail. 


9.4.4.5 Replicate 


The replicate function takes a specified non- 
replicated directory and replicates 1t throughout’ the 


configuration. That 1s, a copy of the directory and all its 
entries will be created by the Catalog Manager on each host in 
the configuration with the same UID as the source. The function 
1S restricted in that only the root directory or children 
of replicated directories can be replicated. To ensure 
consistency, @ copy of the newly replicated directory is made 
available available by the Catalog Manager when the operation 
has been completed locally. Thus, only when the new directory 
1S allocated and its entries are copied is it made visible 
by inserting 1ts UID into the Catalog Manager's UID table. Each 
copy of the directory 1s also marked as being replicated 
to assist the Catalog Manager in its future management. 
The operation 1S managed by the Catalog Manager of the source 
directory which communicates directly with all the other 
Catalog Managers in the configuration to complete the operations 
on their hosts. 


The replicate operation 1s logged by the initiating Catalog 
Menager to allow reintegration of hosts which cannot complete 
the replication immediately. We will discuss update log 
maintenance and reintegration later. For now. we note that a 
log entry is created for the operation and hosts that have not 
completed it will use the log in the process of reintegration at 
a later time 


-140- 

















- 





sete nee Aan wa Haat re at i at a il shen ona Font nay an a! sna Me hee aa 
me ee BR ui Muterthttntoinigaty! SR TRH ante mite 






















_— 2 ary; 4,8:9, * 
METCTRURER UD RPARORUREN VeruC vs reer VRMEM ARMA ERM we UN ua Silas ia a 









Replication in the Cronus Catalog 


orate 

ahs a 

Replication in the Cronus Catalog a 
Figure 9.5 


-141- 
























Bete 
ant 
a ee oe oe a is a aot iw at ne 
WAS “ tar Ra ar stat ey meee tw oh 
ate a Re eas sintny pS % SA moses sf ase abe soa Bee par ae ea Nua one ' 
A aT 
OR om mate a a sna BR RO tirana Sat i ios tates rats 3 ne 


PRUNE AEMUNERUREMUR UM RAURSRUN EUR ta RN UR ARMM MMMM KE MAKE MRR ARRM DEM EM EM LM LN MUN 


ser 






) uty 
se 
S) 
ph 98, 
a 
mate 
SOND 
. Hategett. 
vote 
e 
a 
saat 
The following pseudo-code describes the algorithm: ean 
tat tatala 
POLS al 
REPLICATE DIR 
IF DIR ALREADY REPLICATED OR PARENT NOT REPLICATED THEN Pet 
ERROR ia ie 
eaten 
ELSE : nin 
LOG REPLICATE DIR RH 
MARK LOCAL DIR REPLICATED Ps 
FOR ALL REMOTE CATALOG MANAGERS a sn 
‘ ws 
CREATE REPLICATED COPY OF DIR os Bran 
/* CREATE, COPY ENTRIES, MARK DUPLICATED. Rea 
MAKE VISIBLE*/ Pete, 
e 
There are several] issues that are raised by this method Cae as 
aside from those of log file management. First. the algorithm eink at 
requires that there be a database of 4]] hosts 1n @ configuration attat olae 
that run the replication’ service. The database should be seat’ 
distributed on a!] hosts for efficiency and availability ie e 
WOU RA 
The second 1ssue 1S whether” the remote replications sie 
should be managed synchronously (waiting for remote annie! 
operation to complete) or asynchronously (telling the remote Banat 
Catalog Manager to start the operation and not waiting for mae 
completion). If the operation 1S synchronous, there are i: 
obvious performance implications for completion depending on how reat 
long the operation will] take. For a large configuration this Haynes 
could be a@ problem A time-out will] be required for those hosts a 
that are down or cannot respond Asynchronous management SO, 
means that it is hard for the originator to know when and “ane t 
if the operation was completed. It puts more of a burden on the oe 
reintegration procedure’ for making sure the operation is 
carried out successfully One possibility in the 


asynchronous case 1S for the target to acknowledge start of the 
operation and not have the originator wait for completion. 


The issue here 3s the definition of when an operation is 


complete. Strictly, an operation 1s complete only when a))} 

hosts in the configuration have successfully completed it. 

However. it may be sufficient to consider an operation 

“complete” from the point of view of the initiator when it has 

been successfully logged and all running hosts 1n_ the 
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configuration have been notified to start the operation. itaitleate 
Since the reintegration procedure will presumably eventually ues 
cause the operation to be completed on al] hosts, relying on it ee 
to make sure the operation 1s completed on all hosts is > ey) 
probably adequate. Thus, the initiator’s responsibility is to ni 
a) log the operation; b) notify all running hosts in the aunt 
configuration to start the operation; and c) complete the as My 
operation on the local host. Once the operation is successfully rentnr] 
logged. we assume that it will be completed on all hosts : bad 

eventually by the reintegration procedure even if any of at Ne 
the hosts (including the initiator) crashes in the midst of an Ridin 
operation atta) 


Ye 


The only problem with this approach 1s 1f a host cannot 


complete the operation operation due to problems’ such as ee 
lack of resources (e.g., no space to add new directories, bee 

etc.) In these cases, the best solution 1s probably to notify wn 
the operator of the resulting inconsistency through error logging pve ty 
or the monitoring and control system so that the problem Dtatlytt 
can be manually resolved The reintegration procedure’ can TAA 
still be used in these cases to retry the operation at a later cas tate? 
time. but presumably operator instruction will be required in NI 
some instances to clear up the cause of the problem. ee 


Another issue in the design of replication functions is eae 

maintenance of the secondary catalog database. Recall that 

to maintain accessability of symbolically named objects, a aie, 
secondary catalog entry 1S maintained on the host where an object SER 
resides if the object is located on a different host than its its Ne 
primary catalog entry. Thus, objects will be accessible ‘ AT 
symbolically through this secondary path even if the primary path oN he . 
18 unavailable. However, in the case of the replicated Sa a 
portion of the catalog, the need for the secondary database is dene 
obviated since the catalog information 1s already available on Rema 
all hosts in the configuration. gu at 
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tty 9.4.4.6 Dereplicate bitetelat 
car atte 
“ OO 
yi The dere cate function takes a specified replicated Soe 
at directory and removes all copies of it except the one on the rides 
KY host of the originator (which can be any host in the ‘Ae 
ny configuration). Dereplicate only applies to replicated nt 
aft directories whose children are not replicated. The algorithm ‘ Af 
Wy 1S similar to replicate in that it takes place available: first eaGle!.! 
the directory 1s made invisible on the remote host, then 2 
an the remote copy is removed. The following pseudo-code ay ot 
A) summarizes the cjeration. “ee 
ti aia 
Ny 0,8 26%, 
wt DEREPLICATE DIR wean 
a IF DIRK NOT REPLICATED OR ANY CHILDREN REPLICATED THEN Wee 
a. ERROR 2 
tj : a 
ae ELSE a My 
Y 
a LOG DEREPLICATED DIR Ny ‘an 
we MARK LOCAL DIR DEREPLICATED erent 
KX, FOR ALL REMOTE CATALOG MANAGERS MiMi 
: e 
i" DEREPLICATE DIR /* MAKE INVISIBLE, DELETE DIR */ biel 
ry .? tat ta, 
i pie. 
, ) 
hh One issue with dereplicate is how to preserve the a 
wn characteristic that subtrees of directories below the dispersal te 
im cut be contained on 4 single host. One solution would be to ao @ 
ey force this condition to be true before the directory could be " i 
a“ moved below the dispersal cut (dereplicated). This would my Nt 
i require manual reorganization of the directory hierarchy a we 
thy before dereplication. Another approach might be to relax this * + 
HY) constraint and allow the dereplication to take place anyway. we y 
As an optimization, the hierarchy could be reorganized manually __% 
z neat 
= later to meet the condition. Palen 
rp Aw i 
48 ts Ms 
Fn 
"ae arty 
9.4.4.7 Modify A 
AN SON 
Ny The modifv replicated directory operations (add, delete, ae 
a change) also proceed along the lines of replicate/dereplicate, os aes 
X adding the operation to a log file and notifying all the re ay 
‘ remote Catalog Managers to complete the operation. wind 
‘ @ 
RY 
* aay 
' a 
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Modification of the existing directories presents @ more severe nati 
synchronization and locking problem than replication and STO 
dereplication. For replication and dereplication, atomicity of 
the operations ensures’ consistency, since the directory wil] Re 
be available somewhere, by virtue of the Cronus IPC system (i.e. oe AY 
UIDs are location independent ) even if it as not yet ees 
fully replicated. Modification, on the other hand, could lead PSM 
to inconsistency if the operation is not completed successfully ate, 
or if simultaneous modifications to the same directory are e 
attempted. 

Clearly. some form of concurrency control is needed to 
prevent conflicts and inconsistencies. Because changes to the 
root portion of the hierarchy occur infrequently, we can 
prevent conflicts (simultaneous changes to the same entry) by 
locking the root portion when any change 1s made, so that only 
one change can occur at any time. Since modification of 


the root hierarchy is an administrative function, this is 
probably acceptable. 


Inconsistency in the root portion of the hierarchy 3s a4 
different problem which results from latency in completing 
the operation across 4]1 copies of the hierarchy. This results 
in periods where the directories have different contents. 
This may or may not be a problem in practice. depending on how 
frequently changes to the root portion are made. 


9.4.4.8 Update 


So far. we have avoided the problem of hosts that cannot 
complete replication operations, either because they are down 
during an operation or because they are isolated through 
Network failure or partition. We have mentioned that the 
approach we will] take for reintegration 1s the use _ of pending 
actions logs where each operation 1s recorded until 
completed bv all hosts in the configuration. We now discuss 
the details of reintegration and log file management. 


The basic idea 1s that there is some log file 
accessible to the Catalog Managers on all hosts. Entries in the 
log are made for each replication operation. A host's 
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catalog manager reads the log’ file when it comes back up and 
before it accepts anv new requests. For each entry in_ the 
log not completed by that host, the indicated operation 1s 
completed and the host marks the operation complete in the entry. 
When all hosts in the configuration have completed the indicated 
operation, the entry is freed for garbage collection. 


Entries in the log file consist of an operation code 
(replicate/dereplicate directory, add/delete/modify directory 


entry), arguments to the operation (directory UID or actual 
entry contents), and a vector of operation done bits 
corresponding to each host in the configuration AS the 
operation 1s completed by a _ host. its completion bit 1s set by 
the remote host's Catalog Manager at reintegration time. If at} 
the bits are set and the entry 1s the last one inthe log 
file. the file can be truncated by the Catalog Manager. 
Garbage collection 1s done by a daemon process that runs 
periodically to trim the log file. The assumption is. that 
the normal state of the log file will be empty (1.e. a] 
operations completed). In anv case, as long as al} the hosts in 
the configurations eventually come up. the log file will 


eventually be trimmed 


Initially, at least, there will be a single central log 
file accessed by a _ global UID known to each Catalog Manager. 
Admittedly, this presents a weakness in the mechanism, since 
if the log file becomes inaccessible, updates to the hierarchy 
cannot be done. This can be dealt with in the future by 
replicating the log file on multiple hosts or by using the 
persistence database in each Catalog Manager to ensure the 
operation's completion. 


The central log file can also serve as a lock on _ the 
hierarchy to serialize the updates Anv access to the log 
file must be exclusive. This presents synchronization 
problems in updating either the log file or the _ replicated 
portion of the hierarchy itself. Whenever a Catalog 
Manager attempts a replication operation, it first tries to 
open the log file for exclusive access. When the 
initiating Catalog Manager completes the entry, 1t releases the 
log file. Again, the infrequence of most hierarchy updates 
should make this acceptable. 
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9.4.4.9 Failure Analysis 


Let us look at the types of failure in maintaining the A hdsghdsats 
‘« 


























replicated database. A host can be down when the operation Sr 
is started. In this case, the reintegration procedure aie 
will cause the operation to be completed when the host restarts. itd 
Another form of failure is communciation failure that isolates Ratatat 
a host from others on the network. In this case, we assume Lut. 
the effects are similar to the previous case, since presumably if RTO ER 
one host on the LAN cannot communicate with another, it is vitegennt 
isolated from all others in the configuration. A third type of KY HAYRY 
failure ts inability to complete an operation because of ARKH 
resource limitations or some other cause unrelated to total Hetty 

ON 

host failure or isolation. AS we mentioned earlier, the best we Una 
can do here 1s to report the error and wait for manual as 
intervention to clear the source of the problem and fix the asia 
inconsistency. The latter two cases argue for running Neaticatiyti 
the reintegration procedure periodically, even if the host ae 
has not crashed, to restore consistency to the database in_ the teu 
event of a transitory failure in communications or resource tells 

limitation. etc. ae e 
} aH) ay? rn 

aegtalet te: 

A different type of failure occurs when & host’ crashes mactirn 

in the middie of an operation. Here, we want to avoid partial or pitattastte, 
incomplete results and ensure’ that the operation is ent 
eventually completed correctly when the host restarts. There are OY 
three mechanisms for protecting these operations from the results cre tatt 
of such crashes. First. the initiator logs the operation as MO ANE 
early as possible so that the reintegration procedure will be SY 
able to recover from any subsequent host crash. Similarly, Nat 
hosts completing the operation do not mark the operation palate, 


complete unti] 1t has actually been performed. ’ 


Second, the effects of the operation are made visible only 












after the result 1s valid to avoid partial or unusable 
results. Finally enough information is available for’ the 
manager to verify whether the operation has already been 
completed or not. in case of a crash before the operation has 
been marked done in the log. This avoids the problem of a host 
trying to perform a completed operation multiple times. 
Thus when a host reads the log file it must verify that the 
indicated operation has not already been performed. For 
replicates or dereplicates of directories, it can check to 







see 1f the indicated directory already exists (or doesn't) and is 
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marked as replicated (or not). For addition or deletion of nite 
entries to replicated directoryes it can similarly search for the mitt! 
existence (or not) of the indicated entries. Finally, for arb hy! 

‘ modifications of a replicated entry, both the old and new pai 

; entries must be present in the log entry, so the host can mito 

af determine whether the modification was completed or not. Rted 

' arte 

’ 

rm) To summarize, the following describes the general form of A ey 

the operations from the point of view of the initiator, the cane 

i remote hosts. and the reintegration procedures. SRY 

NOXRN 

; INITIATOR. Nie 

‘ LOG OPERATION meant 

i NOTIFY REMOTE CMs OF OPERATION tecstsa!t.« 

g COMPLETE OPERATION LOCALLY = 

’ Cy 

4 Wn wy 

¢ REMOTE. COMPLETE OPERATION LOCALLY BY 

i MARK OPERATION DONE IN LOG neh 

q WK) et 

i at 

‘ UPDATE. LOOP. READ LOG ENTRY reytogt 

IF OPERATJON NOT MARKED DONE BY THIS HOST THEN : on 

VERIFY OPERATION NOT DONE tf Aon) 

IF NOT DONE THEN x i 

, COMPLETE OPERATION LOCALLY MR 

“4 

+ Fl pats y 
’ MARK OPERATION DONE IN LOG alg 
FI trp 

Y vatpayt 

rae 
tte, 

pene 

' ater 

4 

" Ports ie 

p 9.4.4.10 Other Operations he My 

t) Earlier, we referred to two other functions which nate 

are important in the practical administration of the “37 So 

J replicated root portion of the Catalog Hierarchy. The e a] 

first, move dispersal] cut. can be thought of as a compound Sey 
replicate/dereplicate operation whose semantics are: given a pate) 

K directory in the hierarchy move the dispersal cut to Rt 

t include it an the replicated portion by doing the appropriate ea! 
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replicate or dereplicate operations on the intervening cacy 
directories. Conceptually this can be thought of as traversing a e 
the hierarchy and performing the individual replicate or RG 
derepl]icate operations. Operationally. this function may be SR 
quite dangerous, so thought must be given to protecting it On 
ae at tat 
suitably. i ae 
tie 
The other function relates to adding a new host to a e 
configuration. There are a few issues involved with this 8 teat Es. 
task. The first 1s to add it to the configuration database Meta 
so that it can be identified as running the replication service Rete 
by other managers. Second, it must be able to get & copy aeteqaraytt 
of the replicated portion of the hierarchy from another manager. beater 
This 18 Similar to the action required mn replicating a ‘ ce 
directory. In this case one cf the Catalog Managers would walk Wiens, 
down the root portion of the hierarchy and send copies of “in feast 
each replicated directory to the new host. Since this is at oS 
presumably done infrequentiy and at a time before the new fs CASON 
host 1s supporting users, performance and synchronization NK ‘Ss 
issues do not seem to be major problems. Finally, the update 
log file must be reformatted to include the fact that there 3s a a 
SOON 
new host in the configuration (1.e., new entries must accommodate atts 
the new host in the vector of operation done bits). : 
A similar inverse set of operations must be done _ for 
removing hosts from the hierarchy. The host being removed 
must be taken out of the configuration database and the log file 
must be updated to account for the host's no longer 


participating in the replication service. 
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10 Input/Output 


10.1 Introduction 
= 
Devices. such as line-printers, tape-drives, or terminals are ae 
integrated into the Cronus system as sub-types of a generalized oe 
1/O object CT_l]OStream, which supports a generalized set of I/0 


operations. 


We have tried to generalize the input-output operations to 
make similar operations on different types of objects as similar 
as possible. so that programs and programmers do not have to be 
burdened with special-case software which depends on whether the 


output 1S a terminal. @ printer, a disk file, or the standard 
input of another program. There are places where’ these 
similarities break down, as discussed below. The special-case 


software 1s isolated in the PSL so the CRONUS applications 
programmer wil! be largely isolated from these details. 


10.2 Operations on devices 


Devices are objects of type CT_Device. which 1s a subtype of type 
CT_IOStream. and implements the standard operations of that type. 


Open 

Close 

(15) 

IOLock 

Read 

Write 

10StreamsOpenBy 

OpenStatusOf 
CloseProcessOpen!OStreams 
CloseAl]ProcessOpen!]OStreams 


(15) Open and close are used for synchronization. They are also 
used to trigger those actions that many device managers will wish 
to perform (e.g., hanging up a modem when the last process closes 
its output to the terminal. issuing 4 form-feed when @ process 
opens the lineprinter) when the device gets accessed. 
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In addition to these operations, device objects also implement a 
number of special-purpose operations, for example. a tape drive 
or a disk drive have a Seek operation to allow writing or reading 
to be done from a particular position in the medium which the 
device uses. (16) The details of individual device-object 


operations will be specified as actual devices are added to the 
CRONUS cluster. (17) 


We anticipate a hierarchy of object types, breaking down 
into finer and finer distinctions. For example, CT_lO0Stream > 
CT_Device > CT_printer > CT_lineprinter. Just as there are 
several kinds of I[/O-stream objects, there mav be many kinds of 
lineprinter object, perhaps one for each kind of lineprinter, or 
there may be page printers and graphics printers. 


Device object managers also will commonly refuse a request 
for “frozen” access. In addition to the exclusivity of access 
provided by frozen access, one also gains the ability to cancel 
the writes which have been done to the object. This latter 
ability cannot be implemented on devices in any meaningful way. 
so this form of access 1s not allowed by the device's manager. 
(18) One may open devices for exclusive access. of course. 


(16) Other special operations individual device managers are 
Jikely to implement are. density and format contro] for tape and 
disk drives, many devices may be turned off-line by software, 
printers wil] have page-length, page-width, and font controls, 
and so on. 

(17) The description of the special operations on terminal 
devices 1s discussed 1n section 11. 

(18) We might at some later date explore making some device 
managers clever enough to provide their own spooling, in which 
case one would be able to do frozen writes with the ability to 


cancel the writes. Such cleverness would Jikely lead to a number 
of special-purpose (spooling-oriented) operations, such as 
“perform output after a specific time’, etc. While 1t might seem 


{hat such cleverness 1s more appropriately placed in a program 
and not in a # device manager, for efficiency reasons one might 
desire to eliminate the middle-man. 


For example. a file to be spooled for printing, the 
requesting process, and the printer manager may all reside on 
different machines. There is little point in the data from the 
file to be passed through the network to the requesting program, 
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10.3. Implementation overview 


a a a 


For each device object on a host there is a manager for the 
5 device. Device managers may manage multiple devices (for 
3‘ example, a host might have only one line-printer manager for al] 
of its lineprinters, or may have a single manager that manages 
‘ both tape-drives and disk-drives (19) ). or @ manager may manage 
a @ single device. Which of these approaches 1s taken will depend 
entirely on the implementation, and is not within the scope of 
this document. When started. the device manager registers the 
UIDs for its devices with the operation switch on its host, so 
that the Cronus IPC mechanism delivers operations on the device 
object appropriately. 


ve” nae 


10.3.1 The use of large messages for device ]/0 


ow me a 


We expect that most 1/0 devices wil] be done using a stream 
interface as supported by Cronus’ large messages, in order to 
avoid passing al] the 1/0 messages through the operation switch. 
This implementation 1s different from primal files, for example, 


~~, 
a 





















' because of the fundamentally different wavs in which we expect 
: the object managers to be implemented. For devices such as ‘) 
line-printers. terminals and tape-drives, it seems realistic to ly af 
. expect that there will be one manager process per physical CO 
device. Unlike the primal file system. which is accessed by many eae 
+ processes at one time, an individual device 1s typically a RAAK 
: limited-access entity. Users typically require exclusive access ase 
zm) res tats 
4 then passed back through the network to the printer manager when mre 
: the data could go straight from the file to the printer manager Sioa 
in the first place. Thus, a printer-ob)ject-manager may implement 
t a "spool for printing” operation which takes the UID of the file ra 
4 to be printed as @ parameter. Probably the act of spooling itself ma wt 
’ should be treated as an object and given it's own UID. Suggested rare 
i operations on spool-objects. Create (to get a UID for subsequent wateleial 
) transactions), Remove (to cancel] a spooled action); TimeToBegin attain 
" (to set the time for the spooled action to take place); as_ well} @ 
as the usual printer-oriented operations (header format, font, “tant 
: etc.). ini 
‘ (19) Exotic as this may sound, 1t 1s easy to imagine aie single ancl 
manager for DEC-Tape drives and disk drives, for example. veep 
aati 
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to a device while they are using it. Thus we expect a device haetnehle 
manager to be able to maintain a stream connection to everyone NADA 
who wants to talk to its object. Very few constituent operating 
systems would permit a process to have so many open network BoSatess 
connections supporting the message stream at one time, so we RA 
expect 1/0 from primal files to be datagram-based, rather than epee 
connection based. In contrast, I/O from devices may be een 
connection-oriented, bypassing the operation switch for reasons Sota 

of efficiency. _@ 
Sa 
Ste) at, 
10.3.2 Reasonable defaults for unspecified options rea 
N a uy 
In order to provide uniformity of access, the device e ’ 
Managers assign reasonable defaults for their device-specific rn 
parameters (e.g., tape density) if the application program does \ matte 
not 1Ssue operations specifically setting them. The goal here 1s Mateatiel 
to provide an access mode in which the application program can eatatiat 
remain largely unaware of the nature of the object receiving its ae 
output or providing its input. eo. 
areratgratat 
af sr) (MG 
Reo 
eaaeati's 
ateatetatets 
10.3.3 Naming atten 
SKHRKEN 


Devices like any other Cronus objects have names in_ the 
globe Cronus symbolic namespace. These names may appear anywhere 
in the name heirarchy though, as happens on UNIX systems where a 
similar approach is taken to devices. most devices will probably 
be gathered together in the directory ":dev”. For example, the 
most popular line-printer may be given the name “:dev:Ipt", or 
devices may be given more descriptive names, like 
“:dev.fancy_printer_in_graphic_arts_dept”, or users may choose to 
locate the name of a private device in & more’ convenient 
directory. like "“:usr:melissa.my_printer” (for the printer in 
Melissa's office) The symbolic catalog name 1s used only as a 
convenient means for accessing the device UID and plays no role 
in the way the Cronus system treats the device. (20) 


(20) “Attached to” here 1s taken in a very Joose sense. BBN-CLXX 
has a printer which 1s physically attached to a BBN-NET TAC port 
{and which 1s accessed by a number of hosts). vet 1t iS easy to 
imagine a device manager for this printer being provided in the 
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7 11 User Interface ah t 
: tee 
11.1 Introduction ey 
t, ORD 
¢ The Cronus user interface provides uniform, convenient ALS DY 
Ks access to the functions and services of the Cronus distributed Nate 
4 operating system and the subsystems which run under Cronus. User NN 
x requests for access to the functions and resources of the system KEKNH] 
are similar for al] DOS components; that is, a request to run a od 
program is the same no matter where the user access point is in tt! 
: the cluster, and no matter where the process that satisfies the aa 
‘i request 1s run a 
) tay! aie 
. The user interface includes four major elements by which kh a 
; human users g&in access and interact with Cronus to perform ie c3 
tasks. LAR 
: menatt 
‘ 1 The termings!] manager is responsible for the behavior of ue 
r the terminal or other device by which the user gains ARN 
: access to the system Cronus supports a number of MAY 
’ different terminal managers for users who have a direct ae 
Ue a) 
connection to the cluster or who access Cronus’ through an 
‘ the Internet. rata in 
) Bis 
: 2. The session manager controls the user session from login a Oy 
; to termination. It operates on the authentication data AAPM 
: base (through the Authentication Manager) to verify the 2 
: user's principal identity, and on the session record data cet 
‘ base (through the Session Record Manager) to record ee 
: information about the session. It also creates parallel aa 
¥ execution threads and allocates portions of the terminal, SER 
‘ under user control, to each thread. ey 
: 3. The command Janguage interpreter (CL]) receives requests nai gt! 
y from the user to create processes and execute programs to Ss 
perform the tasks. ‘e i 
: peeaa 
; 4. The yser programs or applications that actually perform ‘ttn 
4 the tasks run 1n program carriers (see Section 5). The aan 
; terminal manager. session manager, and the CLI] cooperate ae 
‘ in creating these program carriers, loading them, passing 5 


Cronus cluster. 
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parameters to them. and directing the input and output to a 
the places that the user has requested. nettle 


a « 

The design of the Cronus user interface has been influenced ah alt 

by the following considerations: mats : 
atislis 

° The user interface should deal effectively with the anne 
distributed character of the operating system. 


° Variations in cluster configurations and in user ange 
requirements wl} likely lead to @ number of different Aho 
user interfaces, and these interfaces wil] evolve. He, 
Therefore, the current implementation should focus on the 
underlying structural concepts needed to support a e 


variety of presentation methods. oe me 


G 
« 


A) mi 


fo) The utility of Cronus depends on widespread vateiialy 
accessibility. Therefore, the initial implementation neat 
should support commonly available terminalis instead of KX 


more powerful devices which are now just becoming worn 
available. fie 


fo) The user interface should support system reliability and uh i 
error recovery from malfunctions during a user session. y ee attaat 


The consequences of these observations for the design of the user ee ' 
interface mn 4@ distributed system are explored in the next RM 
section The terminal manager, session manager, command language et 
interpreter. and the pattern of the cooperation among them and x wy 
‘, 


their use of other system objects are discussed in the following Ae 
sections. V fai 


11.€ User Interface Design for a Distributed System nie te 


The Cronus user interface 1s a generalization and extension BH 
of user interfaces provided by other computer systems. Since RK 

SN 
Cronus 1s 6 distributed operating system that integrates a atta 
collection of otherwise independent computer systems, the Nestinteny 
implementation of a function may be dispersed across the cluster. 


The Cronus user interface 1s independent of the user interfaces 
for the COSs 
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" The following are some of the design objectives for the user ‘ aN 
4 interface that have been influenced by the distributed nature of pay 
, Cronus. 
‘ 1. Command invocation and program control] should be uniform naan ty mi 
§ across the cluster. , sa 
i men 
’ 2. Multiple paralle} activities should be supported directly maton it 
: by the user interface. e 
yaaa 
: 3. The user should be able to start and control distributed Reeaty: 
E activities. Naty! a 
i 4. Svstem operation should be independent of the location of pty AH 
: the terminal manager, session manager, CLI, and user e 
oR 
processes. ane 4 
es sis! 
x 5 The user interface should support detection and recovery ae mae 
' from malfunctions affecting only parts of a user's eR 
‘ session. dane 
e 
6. The user should be able to 1ssue commands directly to the ait 
cos anne 
: . 
i a ve 
: First and foremost, Cronus itself provides for the uniform 
, invocation of any command. The command interpreter finds the ie 
command in the Cronus symbolic catalog and creates a program ROR 
‘ carrier for it. Because the symbolic name space 1s_ host mic! 
’ independent. commands can be organized in any manner’ convenient RRR 
‘ to the user. for example. all the programs used to carrry out a siete 
. particular task can be cataloged in a private directory, even if ey 
some of them can only be executed on specific host types. The , e 
~ host 1s normally selected by examining the type of the executable es bet a 
file for the command NY 
; 7 i RK) 
‘ A Cronus cluster may have more than one host of a particular i sat 
\ type. and different copies of reliable files are stored on RR 
: different hosts. The interface allows (but does not require) the e 
user to communicete an intention to use a specific instance of a ster anetaty 
anv replicated resource. saceetat 


‘ 
‘ 
1 
4 
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A programmer can develop multi-part applications in which 
the individual parts (program carriers) can execute on different 
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A single user session may contain a number of independent eetatatt| 
| tasks executing in perallel on different hosts. In such 4 atta, 
| session, the user can exploit the true paralle!:ism which separate 
| processing elements provides and reduce the effects of 
communications delays by selecting the host on which a task 
executes. Cronus provides device-independent mechanisms that 
} support the use of a single terminal sor controlling parallel 
activities. The effectiveness of a particular terminal for this 
purpose is, of course, dependent on the capabilities of that 
device. 
hosts. To the end user, the distribution of components’ can 
remain largely anvisible. since the programmer and Cronus can 
take care of the details of the distribution. In pérticular, a4 


task mav consist of a multi-host pipeline of processes, tn which 
& process running on one host can pass its output directly to the 
Input to @ process running on another host. 


The Cronus architecture peoovides several kinds of access 


point. Although the user interface has comparable components for 
each of these access points. the location and mode of 
interconnection among the components will differ. The 


decomposition of function in the user interface permits flexible 
distribution of these components. 





zi 
2 


On the other hand. the distribution of the components patra gten re 
increases the cost of synchronization and probability that a oe tsa 
single host failure will affect the user session. To reduce ie He 
synchronization traffic, Cronus does not maintain a centralized pote! °, 
record of all elements 1n a user session. Rather, this data is ee 
distributed among the managers responsible for the individual oS 


parts. This makes the interface somewhat tolerant of failures 
and provides 4 basis for the design of a reliable user session 


The user interface facilitates direct access to cos 
functions through a user Telnet function. which can access the 
COS command interpreter for the hosts of the cluster. Telnet 1s 
treated as a parallel! activity with other user activities; that 
ls, 1t 31S @ separate thread in the user session. 
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: 
q session normally ends when the user logs out. During the 
} session, the user interacts with the system to run programs which 
, interrogate and manipulate Cronus resources and to perform such 
; job specific functions as word processing or data base inquiry. 
4 Users gain access to Cronus in one of following ways: 
’ 
t) 
’ 1. Terminal access controllers (TACs). A Cronus TAC is a 
' terminal multiplexer connected directly to the local area 
network. Cronus TACs are implemented in dedicated GCEs. 
‘ 2. The Internet. The Cronus local network 1s connected to 
’ the Internet bv means of an Internet gateway. Users 
: eutside the cluster may access Cronus’ through the 
' standard terminal handling protocol (Telnet) which 
: operates upon a lower level, reliable transport protocol 
(TCP). 
‘ 3 Mainframe hosts. Cronus mainframe computers can_ have 
D terminal ports that enable access to Cronus. 
' 
4. Dedicated workstation computers. A workstation is a 
computer that is. at any given time. dedicated toa 
D single user. Workstation hosts have sufficient 
' processing and storage resources to support non-trivial 
‘ application programs, such as editors and compilers, and 
to operate autonomously for long periods of time(21). 
The user interface has four principal modules: a terminal 
, manager. a4 session manager. the session record manager, and the 
, command language interpreter 
t When the user activates a terminal, the terminal manager 
connects the user to a4 session manager. There 1s @ session 
manager for each active user. It hes a limited set of commands 
1 for initiating and manipulating sessions and session data (see 
y 
(21). The Primal system wil] not support workstations. 
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11.3 Overview of a User Session 


A session begins when a user activates a terminal that is 
connected to Cronus and proceeds with a system login. The 
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Cronus User's Manual session(1)). The login command, which SAT 
initiates a new session, performs two basic functions. First, it a 
identifies the user, establishes the access rights for. the Te 
session, and gets the user data needed for session hitaden’ 
initialization. Second, it creates a session and records it in ea NON 
sessjon record. A complete description of the session is ooh 
distributed among a number of system components, but the session ea 
record object records the existence of the session and certain nwre 
other key items (see Cronus User's Manual] session(3)). a 
Reetitat, 
After the session manager has identified the user, it starts x 
the initial subsvstem specified in the user’s principal object te ae 
(see Cronus User’s Manual principal(3), principal(4)). This can ana, ” 
be either a general purpose command interpreter or a special eh 
purpose application. The principal object may also request that . mie 
the 1niti4] subsystem be run on a specific host. Pa 
mah 
The session manager maintains session data as part of its paateiath 
temporary state. that 1s. this information does not survive if Methane 
the session manager crashes. The session record manager, on the abst ctaly 
other hand. maintains the basic information needed for session mes 
recovery in non-volatile storage. a 
ne oH 
The initial subsystem runs in the first processing thread in be “ 


the session. The user may create more threads, each of which bie 
consists of a@ varying number of program cerrier 


processes eo 
organized into a hierarchy rooted at the program carrier created are 
‘ 
4 
process of the thread. 


PE 
oes 
a 


by the session agent. This program carrier 18S called the head Ne 
y 
« 
¥ 


‘4 

Often the head process :s a command Janguage interpreter Rein 
(CLI). This is @ program that interacts with the user to receive 7 i 
commands, which it performs by creating and contro] ling eae 
processes. In the following discussion, we assume that the head meet 
process of the current thread is the Cronus standard command Siete’ 
language interpreter. which 1s called cli (see Cronus User's PAN 
Manual cli(1)). CaN 
Ce ae ee 







The head process can execute a command that terminates’ the 
thread. The session agent may also force the termination of a 
thread. The logout command terminates a user session. At the 
end of the session. the session record object 1s removed, and the 
terminal is free to support a new session. 





PERRET ER EO ODORS DEE 

wa Oa « . My te Ne 

ental tntetNt oo tate et ie ary Fen : 
Pra ARO enol an hehe 


yg 


Pee ee OR ee ee PN PU OR a A A i Ws I I he hh eyetarety. 
oy 





Mg Instead of executing logout, the user may detach from the 
Fi session and re-attach to it later. Processes in a detached 
session are no longer controlled by the session manager and from 
K the terminal. These processes will continue execution until] they mi ' 
s require terminal input or output, at which point they will block, ste 
4 and wait until they are re-attached. When the user re-attaches ne 
‘a to this session. the new session manager and terminal takes over we : 
if as the source of control and terminal input/output. The session halo 
manager command resume causes the processes to continue. This ® 
" procedure 1s also used in recovering a session which has been Wy 
‘a detached by a host crash. a 
AD eee 
. The user interface assigns the responsibilities for user 


session activities as follows. 


i fe) The termina] manager encapsulates the physical terminal 
i) device. It handles the terminal device. directs the 
Ms keyboard input to the active process. receives the output 
A. from one or more active processes, and manages the 
display (for video displav units) 

» ° The session manager initiates user authentication. 
uy creates a thread. starts the inital subsystem, creates 


see 


and manages additional threads. attaches and detaches 
sessicns, and assigns terminals to processes. 


° The command language interpreter reads and parses command 


% line input, starts and controls processes that run the 

i commands. and controls assignment of standard input and 

" output. 

‘ ° The session record manager creates and maintains records 

for active and detached user sessions. 

te In addition. other components of Cronus support the user session, Rane 

' of particular importance are the program carrier manager, the ey 

K) catalog manager, and the authentication manager. ee 
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11.4 Terminal Manager 


The terminal] manager 1s the process which is closest to the 
user. 1t provides the Cronus interface to the physical device, 
through cooperation with the COS of the host to which the 


terminal 1s connected. The terminal manager supports three broad 
classes of device. 


° hardcopy terminals that are strictly line-at-time devices 
capable of producing upper and lower case alphanumeric 
characters and the standard ASCII control character set; 


° ASCII video terminals (often called CRT terminals or 
video display units) that support cursor addressing on a 
display screen that 1s large enough to support, for 


example. a full-screen editor, and 


fo) advenced terminals (often called bit-mapped terminals) 


that contain a processing element and enough memory to 
support multiple display areas and graphical output. 


The primary focus of the primal svstem 1s on the ASCI!] video 
terminal because there are manv of them available today. Cronus 
supports the sharing of a single. physical termina] device among 
the parallel activities in a session. This terminal] multiplexing 
will be most effective when an advanced terminal 18S available, 


but will be possible in a limited fashion with the other terminal 
types. 


The terminal manager encapsulates the physical terminal, the 
corresponding Cronus object 1s of type CT_Physical_Termina] (see 
Cronus User's Manual phys_term(3)), which has a number of 
subtypes corresponding to the different kinds of terminals. One 
or more objects (called Cronus terminals or simply terminals in 
the discussion below) of type CT_Terminal 1s associated with each 
physical termina} This provides a mechanism for multiplexing or 
sharing the phvsical terminal among a number of Cronus terminals. 
The Cronus termina! is the input/output device for a process. 
Since terminals are Cronus objects, they have all of the usual 


properties of objects, including host-independent access. In 
addition to the generic operations defined on CT_Object (see 
Cronus User's Manual object(3)),. the following operations are 


defined on objects of type CT_Terminal. 
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Programs may treat a Cronus object of type CT_Terminal like m. 0, 






















an ordinary terminal, since it has a keyboard and @ screen, @ 
‘ although either or both of these may be inactive at any time. RORRRT 
' Each thread in a user session, and the session manager itself, ni Nut 
; has 1ts own object of type CT_Termina!. which will simply be nent 
called the terminal in_ the discussion that follows Within a RD 
thread. processes coordinate their access to the terminal through miata 
“ the terminal manager. : 
; a nn, 
t If the physical terminal supports independent display areas ee alte 
(windows), the session agent maintains a window for status is 
displavs. The rest of the physical display contains one or’ more Sa 
regions. each of which 1s used for the output of a single aetna , 
term:n@a]). The physica] keyboard can be associated with only one e 
of the terminals at any time: the thread that owns this terminal at. 
is the current interactive activity in the session. The keyboard NaN, 
can be transferred to the session manager's terminal by a contro} Pant 
: character sequence (see Cronus User’s Manual terminal(1)). Once MS . ' 
, the session manager 1s in control, the user can execute commands pene tee! 
to create new terminals, remove old terminals, and change the 
current interactive terminal (see Cronus User's Manual PAY 






session(1)). 






weewe - = = 









ete 
Output to any of the regions currently displayed is aut 
immediately visible. Input is directed only to the current bhatate gta! 
thread. Normally all input characters go to a single process. e 
However. when one process creates another process, 1t may request reereet 
f certain (control) characters to be intercepted and sent to it; wnt 
' the interrupt facility discussed in Section 11.8 is implemented WINER 
using this facility. ~~, 














Processes invoke Read and Write operations on the’ terminal 








to get input from the keyboard and write to the display. These RR RS 

use large messages of indefinite length to provide aie stream Pave 
between the terminal manager and the process. A process wil} ae 
; have two messages associated with the keyboard, it sends read w Nt 
: requests on one of them. and receives the input on the other one. Nea 
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As keybord input is collected, it is used to fulfill any . OA 
outstanding read operation. Since the termina! is shared among OOK 

the processes of the thread, characters are sent only in response : 
to a read request. If there 1s no outstanding request, the Ween 
terminal buffers characters until it exhausts the space allocated ‘a atta 
for them. eat 
Mut, 
When control of the keyboard is transferred from one process tent 

to another, the old process stops issuing read requests. If the 

new process needs keyboard input, it establishes the two messages teeth 
used for the stream and begins issuing read requests of its own. meetat ates 


The PSL routines for reading and writing take care of the details 
of establishing the messages, so ordinary applications need not 
be concerned with them. The Write streams are not controlled: 
simultaneous output from two processes in 4 thread may become 


interleaved unless they are coordinated by the application 
program logic 


Each termina] has mode settings which control its behavior. 
These are discussed in deta:] in the Cronus User's Manual page 
terminal(1) Among the most important are the following. 


1 Read activation termination character set. An input 
character from this set terminates the current read 
request. The terminal manager stops sending characters 
after it transmits the ones it has, including’ the 
termination character. until] 1t receives another request. 


2. Echo control Input echoing at the terminal manager may 
be either on or off. If 1t as on. it may be performed 
immediately or deferred unti] the characters are used to 
satisfy a read request. 


3. Buffering and local’ editing Terminal input may be 
buffered until an activation request termination 
character 1s typed. If the input is buffered, local 
editing functions are also available. If the input is 
unbuffered, it 1s sent as soon as it is accepted when a 
read request 1s active, the application process then 
assumes the responsibility for editing functions. 


4 Interrupt character handling. The user may set certain 
characters as interrupt characters, see the discussion in 
Section 11 8 
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11.5 Session Manager 


The session manager creates and -semoves” user session 
records, controls the allocation of the physical terminal 
display, and creates and controls threads. 


During a@ simple session, in which a user executes a series 
of commands sequentially, the session agent is largely invisible 
to the user. The user may. however, wish to initiate and contro] al 
parallel activities. Each collection of parallel activities is a ite, 
thread. Session threads are objects of type CT_Thread. At any Pyar yaa 
time during the session, the user can instruct the session agent 
to create additional threads which operate in paralle} with other 
existing threads(22). A thread can be used to support parallel 
processing or to maintain the state of some activity while the mt IT 
user shifts attention to another activity. welaet 
2 ti 
The first process created 1n @ thread 1s called the head na 
process. and 1s usually a command language interpreter. The ‘ 
default head process 31s an instance of the principal s initia) 
subsystem. but the user may select anv program in the Cronus 


symbolic namespace. 


A new thread 1s created whenever a Telnet connection 1s 
opened, with the Telnet process at its head. The connection may 
be to anv Internet host. either within or outside the cluster. 
For the forseeable future. Telnet paths to cluster hosts will be 
needed to support activities not vet incorporated into Cronus. 
such as maintenance of the COS. 


The following commands are supported directly by the session 
manager (see Cronus User's Manual session(1)): 


Start @ new session (login) 

Terminate a session (logout) 

Attach session agent to an existing session (attach) 
Detach session agent from an existing session (detach) 
Initiate eae parallel activity (create_thread) 

Terminate a thread (k1i11]thread) 

Create a Cronus terminal (make_terminal ) 


(22) There 1s user-settable control kev that activates the 
session manager so the user may invoke session manager commands. 
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Remove a Cronus terminal (remove_terminal! ) 
Map thread to region (map_thread) 

Display threads (showthreads) 

Activate named thread (thread) 

Telnet to host (telnet) 
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11.6 Session Record Manager AOR 
The session. record manager maintains the centrally 
accessible. mon-volatile record of active Cronus sessions in 
objects of type CT_Session_Record (see Cronus User's Manual 
sess_rec(3)). A session record object contains the following 


data. 


- Session UID 

- Creating principal 

- Time of Creation (for identification purposes) 
- Lists of thread UIDs 

- ACL 

- Session agent ProcessUID 


A session record 18 created at the beginning of each Cronus 


session. During the session's lifetime, data is added and 
removed by the session agent. The session record 1S used in aedatiastia st 
recovery after a host or system crash. neegelaset arate 


The session record can be accessed by other programs to 
report about an individual session or all current sessions. In 
addition to the generic operations, the following operations are 
defined on objects of type CT_Session_Record. 


Read_Public 
Read_Private 

Write _Session_Record 
Lookup_Principal 
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11.7 Command Language Interpreter 


A user request usually consists of a command name plus_ one 
or more parameters or arguments. There are three basic kinds of 
arguments for cli: names of objects from the Cronus catalog; 
control parameters, called switches: and application-specific 
parameters. Switches may be associated with either the command 
as a whole, modifying its behavior, or with one or more of the 
object names that appear on the command line. 


If one thinks of the command as a series of words typed on a 
line, the command name is the first word. The command name 
specifies the action to be performed; the actual name is often a 
simple English word suggesting that action, for example, print. 
Cli interprets the command name as an entry in the Cronus 
symbolic catalog. 1t expects the command name to be the symbolic 
name of an object of type CT_Executable_File. Either a complete 
or partial pathname may be entered on the command Jine. A 
designated set of Cronus directories (called the search path) are 
used to resolve partial pathnames; the first match encountered 
causes the search to stop. 


There 1s @ small set of commands built into cli. These are 
used to control the command interpreter’s environment (such as 
the current working directory) and the execution sequence of 
commands. Executable objects may be either process jmages or 
files containing commands. The built-in commands that control 


execution seguence are most often used in command files. me a 
i) relat 


Hie é 
The executable object may be augmented by a4 syntax Seana 
definition, so the command interpreter can know the number and patna 


‘ mi 


type of the arguments, default and legal values for the switches, Sand 

and help information for the command. Users may associate 

private syntax definitions with public commands. Commands’ which ae 

have syntax definitions, either private or public, are called nea 

defined commands. es i 

ms aa 

Command arguments are passed as part of the process TOSS 

descriptor (see Section 5 and Cronus User's Manual process(4)) of ) 

the new program carrier process. When the command = syntax 

definition 1s available. cli performs type and range checking on 

parameters, and conversion from alphanumeric to internal 

representations for certain of types, including Cronus object 

name and integer. Both forms are passed to the application 
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process, since the character string form 1s of use in some cases, ate! 


. wece atte 
for example in generating error messages. sh atecats 


The syntax definition facility is particularly valuable in a 
distributed environment for the following reasons: 


° The cost of remote command invocation is generally higher 
than 31t is in monoprocessor cases. Parameter error 
checking reduces the frequency of execution failures 


sw, ve"; 
0 
caused by usage errors. “« NUN 


AX} 

Matte 

fe) If the command interpreter knows the names of some of the Rati 
objects that the command 1s operating on. it may be able SERRA 

to use object location as one criterion in its selection ; e 

of a site for command execution. *E OSE 

CK) 
maha 
Many command arguments are cataloged objects. Cronus ainteteqetgat 
supports a working directory list, which 1s an ordered collection Reatetaal 
of directories that are used in relative pathname searches for . e 
named objects. The user mav change this ]ist at any time. The TORT 
cli also supports partial name recognition. The user may press a KY 


tata! 
os ratte i 
key to get a list of a]1] matches for the partial name, using both 

the working directory jist and the standard wild-card facilities 
(see Cronus User's Manual sym_name(4)) of the Cronus catalog, 
from which the actual name may be chosen. There is also a 
deferred recognition key which allows the user to ask for the 
matching to be done. but not reported interactively. 


The help key can be used to displav help information which 
1s found in the syntax description of a defined command. 


The command interpreter allows a user to provide aie host 
designator when specifying an object name, including the name of 
the command itself. For example. 


edit textfi] e@CVAX 


would invoke the editor on the copy of textfile stored on the 
Cronus VAX. 


copy filel f1le2@GCE3 


would make a copy of filel under the name file2 and store the new 
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file on host GCE3, and 


Radar@CLXX other_parameters 
would select host CLXX to run the subsystem Radar. 


Objects of various types may be cataloged in the Cronus 
symbolic name hierarchy without restriction. Often, a user w)]] 
wish to select objects of a specific type, so a standard switch 
is defined for type designation. AS an example, a user would 
type 


dir_display file_name.*/type=reliable_file 


to display the names of those objects in the current working 
directory list that match the wildcard pattern file_name.* and 
are of type CT_Reliable_File. 


Cli performs two kinds of initialization. First, internal 
variables are set from a profile data file, which consists of 
lists of (mame. value) pairs. This file can be maintained using 

l sey value (see Cronus User's Manual] editkey(1)). Second, 
cli executes & profile command file 


After cli has collected and parsed a command, it creates a 
program carrier object, loads it with the executable image and 
starts it. Normally the process uses the same terminal as_ the 
command interpreter does. Therefore. cli releases control of the 
terminal to the user process. and waits for it to terminate 
before collecting another command. 


Clj uses the program carrier process support for input and 
output redirection (see Section 5 and Cronus’ User's Manual 
prog_carr(3)). The redirection 1s indicated by the punctuation 
character >, thus the command 


dir_display file_name.* >newfile.Jst 


would place the result of the catalog lookup of file_name.* in 
the file newfile.Jst When c¢]) redirects output into a file 
whose name did not previously appear in the Cronus catalog, it 
creates anew primal file. The user may use the standard switch 
(/type) to designate another type. for example. 
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dir_display file_name.* >newfile.lst/type=reliable_file 
will create a reliable file to receive the output. 


The user can specify that two or more commands” should be 
executed simultaneously and linked together linearly, in such a 
way that the output of the each command becomes the input to the 
next one. We refer to the collection as a pipeline. Since the 
components of a pipeline may be on different hosts. the user can 
dynamically construct multi-machine distributed commands. 


11.8 User Processes 


In most cases. actual work of an application 18 carried out 
by a user process that is created in response to a command issued 


to cli. These user processes are program carriers, and make use 
of all of the properties of those objects. Objects of type 
CT_Program_Carrier have been discussed in Section 6.5: 


Application programs typically make extensive use of the PSL. In 
this section. we discuss interrupts and user error. reporting. 
both of which are supported by the PSL. 


Sometimes a process needs to be terminated by an jnterrupt 
or signal. Cronus supports two forms of interrupt. a hardkill. 
which terminates the process immediately without giving it the 
opportunity for application-specific termination processing. and 
& softkil}) that gives the application process the opportunity to 
terminate cleanly. In the event that programs do not respond to 
softkil] requests. hardkill can be imposed. Interrupts are 
usually invoked by typing a control sequence during 4 user 
session. but they are also generated by a command (see Cronus 
User's Manual signal(1!)). 


Programs mav choose to receive softkil] signals, and use 
them for application-specific purposes unrelated to process 
termination. Cli will] always receive the hardki!]1! signal] and 
remove the application process. 


Interrupts invoke the Stop operation on program carrier 
objects. The exact implementation on a@ particular host depends 
on the facilities of the COS that are available to the program 
carrier manager 
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The processes created by cli form a hierarchy of program wage’ 
carrier objects, which may be decomposed into sub-hierarchies of stately 
the thread object. Any subtree of the thread hierarchy is called 
@ process group. An entire thread is the largest process group. Bau 
Process groups are managed by the program carrier manager in the eae 
current implementation. Operations on process groups support rast 
convenient control] and cleanup of process subtrees. RA 
08) the ale, 
Methods for reporting errors in Cronus are designed to mee 
support a variety of program structures and execution ie 
environments. There are two basic program structures: teat 
Cn "ie 
RY ba 
Asychronous' processes, often called manager processes Sn 
because object managers are of this class. these processes WBE 
receive messages from a number of sources and may not wait oy 
if thev issue requests to other managers to satisfy incoming fe 
requests. Error handling 1n manager processes is. discussed 
in Section 4.6. ia 
{An \ a, 
t me 
Synchronous processes, which process date that arrives In a a 
more or less predictable fashion, often from a terminal! or a : bd 
file. When these processes send messages. they usually wait manele 
for a reply. raat ott! 
CUN MNe 
() 8a, 
We have identified the following execution evironments: PN 
faatacataiet 
ave 
Independent processes are asynchronous processes, oa 
particularly object mangers that are daemon processes Bath 
started by the Monitoring and System or bv another daemon Mi 
process. t ane tltat, 
4 Rial 
titel. 
Interactive processes may be either synchronous or vA] 
asychronous. In this environment, a human user carries on a P = 2 
conversation with the process. Examples of processes in AAT Me 
interactive environments include the traditional SOOM 
applications of distributed systems. multi-host database iti 
systems, office automation, and program development systems. tates 3, 
betratnuiteut 
Pipelined processes consist of two or more programs which © 


might normally be run in an interactive environment that are 
connected in such a way that the upstream process writes its 
output on the input of the downstream process. A pipeline 
can span host boundaries a4 
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Background processes are generally interactive programs 
which are set into execution in such a way that the data 
which normally comes from the user is found somewhere else 
(usually ina file). 


In the interactive case, where the error 1s_ reported 
directly to the user, we have a situation that 1s similar to the 
one in an ordinary, centralized operating system. It can be seen 


that error handling 1s similar in pipeline and background cases. 


A program in an interactive environment will also report 
certain errors to the Monitoring and Control System (MCS). These 
include errors caused by system resource limitations and some 
kinds of access contro] violations. 


Independent processés, including Cronus managers, report 
errors to the client which issued the original request, and may 
also send &@ message to the MCS. In addition, Cronus managers 


keep statistics on the kinds of errors which have been detected, 
and report them to the MCS periodically. 


The responsibility to terminate or continue processing 
belongs with the application or manager, so PSL routines never 
take pre-emptive action, and never terminate the process. The 
PSL routine cannot understand the situation well enough to exit 
properly. since the routine may be executed within an atomic 
transaction. or within & composite action which has other work- 
im-progress entries (see Section 4.6) Instead. it sets 
parameters describing the condit:on in an error block (see Cronus 
User's Manual error(4)), and tu application error handler fields 
the error and processes it. 


The standard error list mav be found in the general 



















Introduction to the Cronus User's Manus]. Each PSL routine page 
in Section 2 of the Cronus User's Manual lists the errors which 
may occur during the execution of the the function. In most 
cases, an aiunteractive process would perform any necessary 
cleanup. and then use the standard error reporting routines (see ® : 
wee 
Cronus User’s Manual error(é)) ws 
oat te 
Y at 
Whenever an error 1s detected in processing a request from a wi 
“ .. 4 
client process, the error condition is reported through the reply seen 
n m 
message The error procedure uses the standard message eh, es! 
e 
ry 
a, 
Naa 
| 
-17)- “- \ om ‘ 
“ sey 
oe, 
‘A 
Je 
. “ 
SEN 
, BACON 
a gee ot Tt gg tp » ;* Pe ae ae kel tel te ee a ee ee 
wpe PO REP SP IRS LE ARP OEE ALE RIN ERCEA ENTE GNIAR 
et ah ala ii ita ne eae Sa ee ee Dal tee tal Tal tal el Dae Da li aia aN Dk Cen ae Ce a as Hp 
oe te 4% nant nap ne oy we ee pte ae iW git bt ‘i she * we rs a ote 8 ‘at ‘aerate ny ra 
Bhatia AN ENON hv at coat 
























structure, and certain assigned keys. When it 


to generate the message to the MCS (see Cronus User's Manual 
error(2)). 
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1S necessary to 
report an error to the MCS, the process uses a standard routine 
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12 Monitoring and Contro) patente 
ati 
12.1 System Capabilities PORK 
, aint 
The monitoring and control system (MCS) for Cronus” includes ai! 
monitoring and control of hosts and of the Cronus functions on Bigs 
these hosts, of the network substrate, and of gateways. The ; iG, 
monitoring and control station provides the functionality of an m i 


operator's console for the Cronus Distributed Operating System. 
The MCS treats Cronus as an integrated system, decomposed by 
function rather than by host. Where practical, it also 
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monitors er ut 
and controls Constituent Operation Svstem (COS) functions from wth nie 
the same station. but such functions are limited by our desire to fe a i 
modify the COSs as little as possible. The discussion in this - _ 
section includes elements of the Reliable System as wel] as of sees 
the Primal System. These additions are included to assure that v8 
the Primal System design does not interfere with future Minnie 
extensions. oe ery 
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Cronus 1s restarted from the Monitoring and Control Svstem. 
For some hosts, the MCS will invoke functions already on the 
hosts. 1n other cases (for example. GCEs which have no disks), 
the MCS will download the host to start Cronus. 
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Network monitoring and contro! of a loca) area cable-based 





‘.) 
network such as the Ethernet 1s relatively simple. It includes a ae 
detection and reporting of changes in host availability, Ny 
monitoring and controlling traffic levels on the cable. Cable mene 
utilization and the traffic level of each host 1S measured. a a 

Ae 
Priority or allowable traffic density may be set for each host. “ arava, 
Transmissions from & host may be stopped altogether. ie 
aww 
eee 
ws 
Ae 
12.2 Svstem Mode} for Monitoring and Control} peta 
Prine 
an 
Cronus consists of a set of services(23) and low-leve) : ° 
Lanai _ * 1 
Sv viem suppo-t entities. including the Cronus IPC mechanism. The AEN 
—_-—__- — — Peat 
Ns 
(23) A Cronus service 18S a process which performs Cronus NN 
operations in response to requests from other Cronus processes. Pefeae 
All objert managers. for instance. are services. a as 
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MCS 1s 4 set of processes on a Cronus host, its functions can be th 


executed from anywhere in the cluster. 


Data 


Reducti 


| | 
| MCS | 
i Service | 
| | 


/ —s \ | / Cronus 
| File 


| 

| Cronus IPC ->| Syst 
| 

| 


a 


‘0 
s 
° 
o 
% 
I 
Vv 
= 
°o 
7) 
+ 


Figure 12.1 Structure of the MCS 


The MCS monitors both the support layer and the services. 
The set of services 18 extensible, and the MCS 1s designed to 
accommodate new services. 


The MCS 1s based on & functional decomposition rather’ than 
on 4 site-based decomposition of the system. For example, one 
service monitor monitors all file system managers while another 
monitors authentication managers. The MCS will be aware of 
distinctions between sites and to distinguish them in its 
reports 
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12.3 Structure of the MCS wttag ts gn 
telat, 


The MCS runs as one or more Cronus_- processes. The MCS 
station is not bound to any particular site, although certain 
information gathering functions are most conveniently performed 
at one location. It uses the Cronus file system, in which it 
will store data, and the Cronus IPC facility. The MCS will be 
divided into two parts. The first part is the interactive 
section, which does on-line data collection, display, and control] 
of Cronus. It obtains status information from host and service 
probes, and incorporates it into its own data base. The second 
part performs data reduction and generates reports. 


The interactive section of the MCS consists of a very  low- 
level) module and a higher level module (see figure 1). The 
majority of the MCS resides in the high-leve] module. a Cronus 
service which communicates with its probes through the Cronus 
interprocess communication facility. The low-level module uses 
only the lowest level of network protocol (User Datagram 
Protocol). This primitive lower level can be relied upon when 
little of Cronus 1s functioning. This portion will be 
implemented first. It provide the functions required to 
bootstrap Cronus. to examine and alter memorv on Cronus hosts, 
and to do simple monitoring of the Cronus network. 


There are two types of reports to the MCS: polled messages 


and traps. Polled messages are reports in response to a request 
from the MCS Traps are reports from. probes which are 
unsolicited. They normally represent unexpected or unusual 
events. 


The MCS uses polled messages as the primary data gathering 
technique. The polling request provides a mechanism which will 
quickly recognize when a host or service disappears. 


Traps are employed for reports about specific events. which 
may require real-time response, or which are unanticipated. For 
instance, the crash of a service would be reported as a trap, so 
that service restoration or reconfiguration could be instituted 
immediately. A host coming up would similarly be reported by a 
trap message. because of the timeliness of the information and 
because a new host on the network might not get any unsolicited 
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polls(24). 


The MCS contains a trap logging service. Trap reports are 
generated by both host and service probes. Trap messages include 
@ service type and priority in their header, so that display 
routines can easily determine which traps’ require immediate 
display in a high-priority window, and so that the operator can 
easily select all traps in a priority range from a given service 
class (e.g. file system). The trap Jogger could be extended to 
permit automatic operations in response to traps. so that a4 
“service crashed” trap report could be used to force a restart of 
the service from the MCS. 


The display processes normally directs critical reports to 
the system operator. with each process controlling one or more 
text streams. A text stream may be directed to a display 
terminal window. a hardcopy output device, a file. or several! 
different places. The operator termina) should support a multi- 
window display. which wi]] enable the operator to monitor a 
variety of aspects of system operation simultaneously. with one 
window usuélly reserved for critical reports. Other windows w)}] 
be created to present data as requested For instance, an 
operator might choose @ process in one window which presents the 
general status for all hosts in the network, and another window 
to present the load status for a particular host of interest. 


When the sophisticated window package 1s not available. a 
simpler interface would enable the operator to monitor one window 
at a time. the difference would be invisible to the MCS) since 
each window would look to it like an independent display. 


The data reduction facilities of Cronus can reside wherever 


convenient, and will be regarded as background tasks. The 
integrity of the system does not depend on their availability. 
but their reports should prove useful to the tuning and 


management of the network. 





(24) Polling for hosts which are known to Cronus but currently 
down would continue at a low rate. however. so that a lost trap 
for such 4 host coming up would not be fatal 
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that the files generated by the interactive section are available 
globally as part of the Cronus file system. 
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12.4 Host Probes, Service Probes, and Network Monitoring 


A host probe 1s @ primitive entity which every Cronus host 
must provide to report status to the MCS. A host probe must at 
least report the presence of the host and its internet address at 
the time the host operationally enters Cronus, and must respond 
to AREYOUTHERE messages broadcast from the MCS. The host. probe 
1s the distributed part of the low-level section of the 
monitoring and control system. A host probe will often offer 
further anformation in its report. host type, probe reports 
avallable, current MCS reports, Cronus” services, level of 
integration. etc. 


Service probes are monitoring entities in al) Cronus 
services. Services to be monitored will include object managers. 
terminal concentrators, and user authenticators. Service probes 


reflect a functional rather than site-based decomposition of 
Cronus. Data from related service probes on different hosts are 
combined in the MCS, in order to present a more understandable 
picture of the service. The MCS specifies what types of data 
should be collected and reported through poll responses and 
through traps. 


A service probe 1s located within the service. Unlike host 
probes, thev may require a certain level] of Cronus functions, 
since the loss of service monitoring and control does _ not 
compromise our ability to restart the system. Service probes use 
the full range of Cronus” services. especially the Cronus’ IPC 
facility. 


Some messages. including contro] messages and high-priority 
monitoring. will run with a priority above that of the service. 
Most monitoring, however, will run with a priority below that of 
the service itself. 














The service probes for the Cronus file system reports’ the 
loading on the local portion of the file system. the number of 
requests for various classes of services. etc. ]t may also 
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include the ability to trace al] activities on particular files 
(using traps) as a debugging aid. 
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a The process manager probe reports machine process’ loading, an 
é both for Cronus and non-Cronus processes, and optionally supports red 
is tracing services for activities on Cronus processes. The probe erates 
Mi will report certain classes of exceptional events on processes, Micgel 
a and will provide services, invokable from the MCS, for invoking Nuit 
“s and killing processes, and for tracing process activity on a ieee 
‘in per-process basis. Py 
‘I ( ‘ ¢, 
" Gateway monitoring would normally fall into the category of a 
h service monitoring, however, the gateway already reports status na ate, 
t in response to polling by @ host. We will use this capability to ts Mi 
: obtain gateway and internet status. Since we do not wish to do . 2. 
j development in this area, we will to restrict ourselves to. the Sete, 
‘ef available capab:lities. eaunt 
M RNA, 
: The MCS will not monitor the cable network traffic directly. wate 
,) Rather. it will gather reports from hosts on the traffic sent, * ' 
i traffic received. and the co)]ision rate at each node 
im 
t 
" 12.5 Loading and Debugging Support 
" 
‘ The control function has the capability for restarting 
" Cronus on the hosts of the network. It may do this in one of two 1, 
ry ways. In some cases (e.g. GCE). this includes transmitting the cn 
wi code directly to the host to be loaded. In other cases, the sas, 
‘fi computer s own loading sequence 1s invoked. using its private ON 
w secondary storage. In no event should the downloading procedure ¥ 
é require the assistance of a third machine. Some machines’ may 
an detect some of their own failures and restart themselves 
bat A distribu ed, heterogeneous system such as Cronus. poses 
? special problems for debugging tools. The goal 1s to have a 

sophisticated debugger which runs on one host and debugs. on 
* another. We would like to have a single debugging system be 
hy capabJe of debugging computers of differing architectures. 
\ Moreover, we would like the debugger to be able to debug at 

source language level to provide for efficient development. 
f Currently. the leading candidate for developing such a tool 1s 
4 4 
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XMD, which 1s adapted from the multi-window editor PEN. XMD does 
not currently debug code in high-level languages, but can be 
extended in this direction, since it does not depend on the 
structure of the debugged code, relying instead on symbol table 
entries to provide it with information about the target code. 
XMD may soon be extended to debug C source code as part of the 
effort of another project at BBN. 


12.6 Cronus Initialization 


The initialization of Cronus 1s performed from the 
Monitoring and Control] Station. In initializing the system. the 
MCS will have no certain knowledge of what hosts are available. 
The first step 1s to poll for the available hosts, and then to 
initialize each host which responds. 


The initialization of Cronus proceeds as follows(25): (See 
the scenario in Section 13.) 
1. The MCS broadcasts AREYOUTHERE onto the network. 


2. Each host has @ routine in its COS that listens for 
AREYOUTHERE and responds with HEREIAM andthe 


parameters (a) name, (b) internet address, (c) boot 
class. (d) boot file name, and any other required 
information. The name 1s printable. The boot class 


indicates the method used to initialize the host. 
Class 1 hosts accept a BOOTYOURSELF command and 
initialize local Cronus software upon its receipt. 
Class 2 hosts require a BOOTLOAD command, which is 
followed by a boot file (item d) which passed to the to 
the host with the code to load Class 3 hosts require 
a host-specific loading protoco!, which 1s executed on 
the MCS from the boot file. (There are no plans to 


(25). These messages do not use the full Cronus IPC mechanism in 
the first four steps of the procedure, since the operation switch 
and primal process manager are not in place on the host being 
initialized Instead, thev wil) be imp)emented as VLN messages. 
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implement Class 3 hosts in the ADM. ) aia 
I Meattagits 
3. When the MCS receives a HEREIAM message, it enters’ the 
nt addresses of the host in @ host monitor table, with a Nh 
" notation that the host is not up. It then sends a ity 
Mi BOOTYOURSELF message if it is a class 1 host, or a tt, 
nt BOOTLOAD followed by the required file if it is a Class Winn " 
as 2 host. ONS 
i 
mh 4. When @ host has completed Cronus initialization, it en 
a sends a message BOOTDONE to the MCS. Alternatively, it Best 
wK) Mav send the message BOOTFAIL. possible with parameters ate 
Ri :, is KNW 
" indicating reason (e.g. "missing file block 5"). The Sse, 
N MCS may then retry the boot, if appropriate. KON 4 
e 
Wh 5. After the host 1s initialized. the MCS will communicate nenen 
‘ny! with it using the Cronus IPC mechanism. It will ty 4 
a normally obtain & list of available services and will BScGe 
ay then ask 1t to start up the services 1t supports. pain 
Oe Ut 
The initialization procedure requires a sma}] amount of code Q_, 
te resident in each processor 1n order to respond to the MCS Rees 
tf messages. This code will fit 1n ROM on machines which do not ae 
he have secondary storage. hey 
e Bitty 
a4 he 
wi 12.7 Siting the Monitoring and Control] System 
tg 
ite Should the MCS be located on the GCE or on an application 
wy) host”? Using a GCE 1s desirable because it can be specially 
’ configured to support the MCS. it 1s intended to be the dedicated 
processor, it provides controlled, predictable performance with 
q dedicated, low cost hardware, and it 1s expected to be 
yy redundantiy available Since UNIX hosts may not be available 
. redundant]v. we would less often have back-up service 1f use it 
“ on a UNIX application host for the MCS. On the other hand, 
. building the MCS on an application host has several advantages. 
the UNIX host provides a much richer development environment. = 
4 have already been written for UNIX. so that Jess program oy 
Why . development would be necessary, we can take advantage of the set at 
he of available UNIX utilities. 
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For the near term, we will build the tools on UNIX. We will 
be careful to code the routines in a portable manner, so we can 
easily move them to a GCE environment. This provides us with the 
benefit of using UNIX in the short term. while keeping the 
eventual goal of relying on redundant GCE's for Cronus services. 


12.8 Phased Implementation 


Implementation of the monitoring and control station wil} 
occur 1n phases. both 1n terms of functionality. and in terms of 
reliability and performance. The functionality wil] be increased 
both as the reporting capabilities of the probes increases, and 
as the need for data analysis grows. 


Initially. the MCS will exist on a single host, without 
strong reliability or performance goals. We will first build the 
host monitoring section of the MCS. and simple host probes in 
order to be able to start and restart Cronus hosts and services, 
and to record the status (up/down) of hosts. As services” are cin 
written, we will add service probes, and extend the MCS to a 
monitor them. This initial system will utilize the UNIX file 
system unti] the Cronus primal file system 1s available, and will 
then convert to the use of Cronus files. Later the MCS wil} 
reside on a GCE and wi}l use standard Cronus files. 


* Revision 1 1 83/06/06 10:39.29 bjw 
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Scenarios of Operation 


13.1 Basic User Commands and Functions 


This section presents examples of the use of Cronus 
functions and of the integration of structural units. Scenarios 
are presented for typical system and application tasks. The 
intent 1s to suggest the interactions through the flow of control 
and shared data. The scenarios also suggest how the primitive 
functions might be combined to support operations required of 
modern operating systems. The first few sections are narrative, 
and the later ones provide pseudo-code examples. Details of 
syntax and calling sequences in these examples are not those of 
the actual implementation. 


Many of the user commands and functions of Cronus fal] into 
the following categories. 


Session initiation and termination. Login, Logout, 
Attach, etc. 


User and system data base status and maintenance. Display 
and edit user records, access contro] lists, show logged 
on users, etc. 


File manipulation and file/directory maintenance. name 
lookup. read. write. directory listing, etc. 


Program invocation and control]. create process. terminate 
process, etc. 


Input/Output: List file etc. 


System Operation. Starting the system, monitoring 
components. etc. 


Each of the following sections presents a scenario from 
of these categories. 
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13.2 Registering a New User 


New users may be added to the system only by members of the 
administrative group. The command to create a principal entry 
issues an Invoke operation specifying the logical name for the 
principal data base manager (CL_Principal) as the target process, 
and including the Create_Principal operation and its parameters 
in the message text. The Invoke uses the Locate(CL_Principal ) 
operation, to find an available principal data base manager, then 
sends the message text to one of the sites that responds using 
SendToHost. The site identifier may be cached to simplify 
subsequent requests. The principal data base manager creates a 
user entry and returns the unique identifier for the new object. 
This UID is the Cronus internal name of the principal. and wil} 
appear in Access Group Sets and Group”) specifications. Tt may 


also be used to identify the user record whenever that record 
needs to be accessed. 


When a principal is added, a number of user dat& base 
entries are initialized. One of those 1s the priority range 
authorized for the user. A private directory 1s created. and the 
principal is given all rights to it. The pathname for this 
directory 1s entered as the default home directory for _ the 
principal The home directory serves as the repository for 


command interpreter profile data that specifies user-customizable 
system features. 


\ 


13.3 Login 


A user may connect to Cronus either through Telnet and a 
standard session agent running on a= shared Cronus host. or 
through a Cronus Termina)] Access Computer (TAC). Telnet supports 


access from outside the cluster through gateways, and from other 
devices obeving the protocol. 


Access through ao Cronus terminal device process is available 
only from a4 host that supports Cronus interprocess communication 
protocols and will probably be supported only on workstations or 
Cronus’ TACs. It 1S more powerful. because the access point 
software is fully integrated with Cronus. 
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To initiate a session, a user must have a terminal device 
process to manage his termina! communication, and a session 
controller process to manage interactions with the system. 
Telnet access requires both processes to execute on a shared host 
of the system. A workstation access path can support both 
processes; a Cronus’ TAC access path places the terminal device 
process in the TAC and the session controller process on a shared 
host. 


Login 1s handled by the Cronus session controller process. 
The user iS prompted for a login name and password. which are 
used by the session controller process to build a request to the 
Authentication Manager by invoking the operation. 


Authenticate_As(name,encrypted_password) 


On receiving this message, the Authentication Manager retrieves 
the associated principal data base entry, verifies the password, 
and creates the Access Group Set for the process. 


The Authentication Manager interacts with the Cronus Session 
Manager to record the session. The Session Manager assigns 4 
session identifier and adds it to the table of active sessions. 
A session record contains are the UIDs of the session principal, 


controller process, and terminal device process. This table is 
used to se isfy status requests about the cluster and active 
users. Some emergency procedures, (for example. destroying al] 


processes associated with a session), may also rely upon this 
table. 


The session identifier,the AGS, and other user data base 
entries are placed in the process environment through an 
interaction with the process manager for the authenticating 
process. 


After modifying the process environment to indicate 
successful authentication, Authenticate_As returns the principal 
UID to the authenticated process. This identifier 1s used to 
interrogate the user data base for other information needed to 
complete the login sequence. One such item is the default home 
directorv, the symbolic name of the initial Cronus directory 
which is used for unrooted catalog lookup operations, including 
the search for additional user initialization data. The 
directory name 1s converted to a catalog entry UID by an 
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interaction with the catalog manager, and the UID 1s stored in 
the process descriptor. 


A principal may have a default program registered with the 
Authentication Manager; if so, this program is executed at login 
time. If no program 1s_ specified, the standard command 
interpreter is assumed. The standard input and output for the 
executing process are directed to the principal's terminal device 
process. 


13.4 Accessing 4 File 


Each process descriptor contains (among other things) an 
entry for the UID of the current directory. This value is 
initialized at login to the principal's home directory, but can 
be modified during the course of the session. The current 
directory 1s inherited by & new program carrier process. 


Suppose a client process wants to read the first 500 bytes 
of data in the primal file with the symbolic name :a.b:c. To do 
this, 1t would obtain the UID for the Primal File by means of. 


Lookup(nul1]DirUID, “:a:b:ce"”, true) 
-> abDirUID. abcCatEntUID. abcCatEntContents. 


By convention, the UID for the null directory, nullDirJUID, is 
used to specify the starting directory whenever a complete name 
1s to be looked up. Next. 1t would read the file data by means 
of. 


ReadtabcCatEntContents .ObjyectUID. 0, 500) 


which would cause the primal file manager to send the first 500 
bytes of data for the file. 


These operations are made available by a single function 
call in the Process Support library. 


ReadFileData(".a:b:c", 0, 500) 
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Now, assume that a process has a relative symbolic name for a i cet 
file. The current directory UID 1s included in the request to Dts Wd 


the catalog to look up the file name. Using the general form of ALU! 
Invoke, the catalog manager 1s found based on the hint in the et 


) 
catalog entry UID. The catalog manager performs the lookup = and ate! 
returns the primal file UID associated with the symbolic name. 
The primal file UID is then used to find the file manager for 
this object, again using the hint which is part of the file UID 


to locate the manager. 


13.5 Creating a File 
A Cronus cluster may contain many hosts with file managers. 
each willing to store and retrieve file data at the request of 


other processes. The operation 


Locate(CL_Primal_File) 


pA 
can be invoked by &@ process to determine the set of accessible ireintl 
primal file managers. 18 


One policy for the creation of files might be to try to 
create the file on the same host as the creating process if a 


local primal file manager responded. If this is not possible, a nm aay 
remote manager can be selected and asked to create the file. The ’ ntagateget 


prima) files manager include status information, information in 
the responses, such the amount of unused disk storage avallable; 
a measure of the current I/O and processor load; or a restriction 
on the principal UIDs that may to create files through this 
manager. This information can be used to select a storage. site 
for the file. The selection strategies are packaged in a library 
routines in the Process Support Library. 


The file may need a symbolic catalog entry. The catalog 
entry operation 1S carried out by the catalog manager of the 
directory to which the file 1s being added. 


Suppose that the client process wants to create a file and 
to give it the symbolic name a.b.c. Further suppose that a 
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directory named .a:b already exists. 
First the client would use the 
Create -> FileUID 


operation to create a new prima) file. The file would be empty. 
The client could write data into the file by means of: 


Write(FileUVID. BytePosition, Data) 
or by bracketing the write(s) by 

Open(FileUID. ReadWrite, Frozen) 
and 

Close(FileUID, RetainWrites) 
operations 


To catalog the file. the chient first obtains the UID of the 
directory that wil] contain the catalog entry for the new name. 


Lookup(nul!lDirVID. “a:b”, true) 
-> aDirVID. abCatEntUID, abCatEntContents 


and then enters the new name 


Enter(abCatEntContents ObjyectUID. “c"”, FileuUlD) 
-> abcCatEntUID ; 


If there were no directory -a.b or :a, then the client would 
first have to create both .4 and .4.b. This could be done as 
follows First the client would obtain the UID for the root 
directory By convention the name of the root directory is 
-Root. The fact thet the root directory is cataloged in itself 
represents the only violation of the tree structured property of 
the Cronus symbolic name space. 


Lookup(nul1]DirVID. ".Root”. true) 
-> rootDirUID. 
rootCatEntUlD. 
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rootCatEntContents 
Next, the client would create the directory <a: 


’ CreateDir(rootDirUID, “a” 
—> aDirUID, aCatEntUID 


and then. it would create the directory :a:b: 
Create(aDirUID, “b”) —> abDirUID. abCatEntUID. 


; At this point, the symbolic name .a.b.c can be established. as 
; above, for the primal file. 


The Process Support Library contains routines coupling the 
creation and naming of files, to avoid the situation where a 
failure produces a file which does not have a symbolic catalog 
entry and hence 1s not easily accessed. The operations are 
ordered such that the symbolic name 1s entered before the file is 
closed. lf the process fails after the name 1s entered, the 
catalog entry may be deleted by explicit user commands, or by 
automatic recovery mechanisms. 


13.6 Deleting @ File 


Suppose the name of the file to be deleted 1S >a>rb>c. 
Deletion 1S accomplished by the following operations 
Lookup(nul]DirUID, ":a:b:c", true) 
-> abDirUVID,. abcCatEntUID, abcCatEntContents 


Delete(abcCatEntContents.Ob)jectUID) 


Remove (abcCatEntUID) 


Delete operation could have the side effect of invoking the ty 


If the primal file and catalog manager are coupled, the SOR 
18 é 
Remove operation. ms 
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13.7 Listing a Symbolic Catalog Directory 


Suppose the name of the directory 31s :a.b:c. A utility 


: program executes the following sequences of operations to print 
f the desired file names. 


InitScan(nul]DirUID, ":a:b:.c.*%.*") 
-> abcScanState, 


xDirUID, nin 
‘ xCatEntUID, ne wet 
’ xCatEntContents ms 
mon 
’ repeat until abcScanState indicates end of scan Raat, 
; { if TypeOf(xCatEntContents.ObjectUID) = A_filetype ~- 


e 
UR 
yy aia 


then print xCatEntContents .SymbolicName ; 


ScanDirectory(abcScanState) 
-> abcScanState, 
xDirvUID, 
xCatEntUID, 
xCatEntContents, 


Pe el 


13.8 Running a Program 


pas ow 


Application programs are executed within program carrier 
objects. The creation of an application process has three steps: 
& program carrier 1s created. the program carrier is loaded with 
the program image, and the program carrier 1s started. 


The program image w)]1 generally be obtained from a Cronus 


f file. which may be anywhere within the Cronus file system. A 
a routine, that combines these process creation steps process 
creation will be available in the PSL. This routine takes as one. 
* of 1ts arguments the symbolic name of the program image file. 
N The symbolic name 1s translated to the file UID by means of a 
R symbolic catalog lookup. and the file UID is used to load the 
i program image into @ new program carrier object. 
: -189- 
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In a heterogeneous system, @ particular program image can aesttagtt: 
only be executed on certain processors. A VAX program image, for AT 
example, can only be executed on a VAX host. Some mechanism must ® 
exist to match the the program image to a processor capable of aN 
executing it. ‘ OR 4 
Pea, 
Te) 
Subtypes of program carriers are defined for each’ processor panes 
architecture for example. CT_VAX_Program_Carrier. These subtypes ee 
contribute no new operations to objects of type _@ 
CT_Program_Carrier, but provide a means of locating ao specific ee 
kind of processor. For example, the operation “nth 
at, 
Locate(CL_VAX_Program_Carrier) Ment 
Pat tat tat 
sett 
will attempt to locate a1] program carrier managers on VAX hosts. & e 
ce 
Executable files are subtypes of primal files with the type eat 4 
CT_Executable. The descriptor of a program image file contains Oy 6h. 
the logical name of a program carrier subtype, e.g.. >x xy 
: CL_VAX_Program_Carrier. The file descriptor may also contain DAN 
: other information such as_ special host requirements. An — = 
‘4 operation on program carrier managers. Resource_Test. determines isi 
a if @ particular manager has the resources which are prerequisites tt 
‘at to execution, the Create_Process routine can invoke this test PE 
i whenever 4 process has special needs. maa 
a 
' ’ 
os The actions carried out by the library routine can now be aS 
is described in greater deta] Sd 
“ tees 
‘! x ni 
nt mee 
fy 1 The symbolic program name is translated to an executable oa iy 
4 file UID. by means of a symbolic catalog lookup. ee 
’ 

e 
=: 2. The routine requests the file descriptor of the program Rare 
" image file. by invoking the Read_Descriptor on the file tet telat, 
, biect “ 

’ objec BG, 


¢ 


Ww 
= 


The required program carrier type and any special wae ts 
requirements are determined from the file descriptor. 


4. A Locate operation finds the Program Carrier Managers 
capable of executing this process, and a Resource_Test 
operation narrows the candidates further. 
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5. A Program Carrier Manager 1s selected according to some 
policy (26) and the operation Create_Program_Carrier 158 
invoked on it; the UID of the new Program Carrier object 
is returned. 


6. The Load_Program operation 31s invoked on the _ program 
carrier object. 


=) 


When the load operation 1s complete, the routine receives 
@ reply from the Program Carrier object, and then invokes 
Proceed on the Program Carrier to start it. 


The Create_Program_Carrier operation takes as a implicit 
parameter the process descriptor of the creating process, which 
1s inherited (with certain changes) by the new process. 


A process descriptor contains some ainformation which 18s 
maintained securely by the system (e.g.. the process UID. and the 
UID of 1ts principal) and an open-ended set of information 
imserted into the descriptor by the Change_Process_Descriptor 
operation. All of the open-ended information 1s inherited 
directly by the descendants of the process. Some of the system 
information 1s inherited (e.g... the principal 1s normally 
inherited) and some of it 1s not (e.2., the process UID of a 
descendant is unique to it). The system information defines’ the 


authority of the new process for access to information and 
resources. 


The creating process may invoke Change_Process_Descriptor 
after but before starting, the program carrier to make changes in 
the descriptor. 


(26) A reasonable policy might select the Program Carrier manager 
on the local host, 1f 1t 18 a candidate, and to select the most 
lightly loaded host (from information in the reply to Locate) if 
it is not. Menv other policies are possible, and exploring the 
possib)]}ities 1s an important area of future work. 
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13.9 Starting a Cronus Service TSN 
In this section we sketch a scenario which might be directed Rites 
by acluster control station, to startup, operate, and take down aa 
a Time Service instance on one host. It is indicative of the te , 
steps required to initiate and contro! an initial process load ty ebatetat 
sequence. The steps required to bring up each host to the point ‘ sett a 
assumed in this scenarios have been discussed in Section 12. a 
vane Sa 
The Cronus Time Service has two main functions. eat 

it y KM nt 
1. To respond to direct requests for the date and time, and Wetete 
for format conversions among the Cronus date and time wlan 
formats. e 
Hie PR 

2. To periodically multicast the date and time on a well-known i 
VLN multicast channel. Beta, 


ates 


XY 
t) ate! 
psi 
Assume that host CVAX has joined the Cronus system, and the 
primal process manager 1s the only active Cronus process. The ra a 
control station performs 


Oe i 

cH 
InvokeOnHost("CVAX", Reet 

CL_Primal_Process, e 
<(Ck_Operation_Name .CO_Service_List)> REMMI 
) Wee te 
eancta 
“) yi 
and receives in reply @ list of the services which could be “RA 
created on CVAX, only the PPM 1s marked as active. The logical a a 


name CL_Time_Service 1S contained in the list. The control 
station then performs 








InvokeOnHost("CVAX" 
CL_Primal_Process. 
<(CK_Operation_Name .CO_Create_Primal_Process) 
(CK_UID_Service_Name .CL_Time_Service)>) 


The Time Service process is created and started. and the control 
station receives a reply containing CVAX_Time_Service_UID, the 


specific UID of the Time Service Primal Process. The Time 
Service begins its work. and if left undisturbed wil! 
periodically multicast the date and time forever. The control 
baal 
hut 
a () rN 
OR nh) 
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station (or any other Cronus process) could request the current Moats 

date and time by performing past 
InvokeOnHost ("CVAX", eng 
CL_Time_Service, G mu 
<(CK_Operation_Name ,CO_Date_Time)>) Br 

o_ pte! 

At some later time, it becomes necessary to temporarily e 
inhibit the periodic multicasts of the Timer Service. The Ratt 
contro] station performs testy 
aten ttt, 
InvokeOnHost ("CVAX" Git 
CVAX_Time_Service_UID, eat: 
<(CK_Operation_Name ,CO_Change_Process_Descriptor). e 
(CK_Modify, ) naar 
(CK_IPCEnabled,"false”)>) ast 
eS 
aati ey! A 
ryt 
After the control station receives the reply confirming this CAATLHe 
operation, it 1s known that all IPC to or from the Time Service @ 
has been inhibited. The Time Service process continues to exist, ent 
however. and is eventually restored to 1ts norma! function when Sa 
the control] station performs Soy 
oe ( 

InvokeOnHost ("CVAX", ate 


CVAX_Time_Service_UID. 

<(CK_Operation_Name ,CO_Change_Process_Descriptor), 
(CK_Modify, ) 
(CK_IPCEnabled,true")>) 


Finally, perhaps in preparation for replacing the Time Service 
code with a new version, the control] station does 


InvokeOnHost ("CVAX", 
CVAX_Time_Service_UID, 
<(CK_Operation_Name ,CO_Destroy)>) 


and the Time Service process 1s known to be destroyed when the 
reply arrives at the contro] station. 


Revision 1.1 83/06/06 10.39.32 bjw 
Initial revision 
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14 Primal System Hardware 






The Advanced Development mode! of the Cronus distributed 
operating system will] have three mainframe computers, four GCEs, 
and a gateway. The mainframe computers are two BBN C70s and a 
Digital Equipment Corporation VAX 11/750, the GCEs are Multibus 
computers with M68000 central processors, and the gateway is an 
DEC LSI-11 based computer. 


| a at: 
dar 
.. 
The C70 computers are configured as general development 
machines. The first, C70-1, 18 the site of the majority of the 
development work since 1t supports both the C70 development tools 
and those of the GCEs. We will rent time on a second C70, C70-2, 
which will be used to exercise Cronus support for reliable 
redundant hosts, and to test scalability. Both C70s will run 
UNIX version 7 as released by BBN Computer Corporation and 
modified by the Cronus project. 


The VAX 11/750 provides a VMS-based software development 


environment, as well as a mainframe of a distinctly different 

archicecture. Its purpose in the ADM is to provide a limited 

integration host. Since 3t is a large well-supported mainframe, re 
1t will contain its own development environment. and we will also aan 
use 1t as a source of computer power for general tasks. both to won 
off-load the C70, and to test real usage of the Cronus até baste 
heterogeneous host environment. The VAX 1s configured to reflect wit talelat, 


its usage as a software development machine. 


The Cronus system has four GCEs, configured for a variety of 
tasks. Since they are compatible machines, their configurations 
will vary over time, as we perform different experiments ‘on’ the 
network, and as we make board substitutions to make one GCE 


J fa’! 
CN AT 
8, fa' tat 


0 , .) é 
AOC) 
wy ene 


perform functions of another which is temporarily out of service. e@ 
The configuration table for the GCEs should be regarded as only a me 
Sa 
typical set of GCE configurations. 
The Cronus gateway 1s implemented on an DEC LSI-11 computer. a 
This would normally be a task for a GCE, however, standard AOL 


internet gateways are currently implemented on LSI-11, and 
adoption of the LSI-11 gateway allows us to obtain an off-—the- 
shelf implementation. The next generation of internet gateways 
is expected to be built on M68000 computers, and at that time we 
will probably move the gateway to a GCE. 
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C70-1 1 Mbyte main storage 
2 80 Mbyte removable disk drives 
Magnetic Tape Drive, 
800/1600 bpi, 125 ips (Cipher) 
Arpanet 1822 LHDH interface 
Ethernet interface (using 
Interlan protocol module) 


C70-2 1/2 Mbyte main storage 
2 160 Mbyte removable disk drives 
Arpanet 1822 LHDH interface 
Ethernet interface (using 
Interlan protoco] module) 





VAX 11-750 1 Mbyte main memory 
1 160 Mbyte Winchester disk 
Magnetic tape drive, 1600 bpi. 40 ips 
MDI high speed synchronous seria] interface 
SCOM Ethernet Interface 
VMS Operating System 


Table 14.1 Software Development Hosts 
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GCE-1+2 Forward Technology M68000 
processor with 256 kbytes memory 
Micro-Memory 256 Kbyte memory board 
80 Mbyte Winchester Disk Drive and SMD interface 
3COM Ethernet Interface 
8-slot Multibus backplane 


GCE-3 Forward Technology M68000 processor 
with 256 Kbytes memory 
Micro-Memory 256 Kbyte memory board 
8-line RS-232 serial interface 
3COM Ethernet Interface 
8-slot Multibus backplane 


GCE-4 Forward Technology M68000 processor 
with 256 Kbytes memory 
Micro-Memory 256 Kbyte memory board 
8-line RS-232 serial interface 
300 Ipm line printer 
3COM Ethernet Interface 
8-slot Multibus backplane 


Table 14.2 Generic Computing Elements -—- Typical Configurations 
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Gateway LS111/03 processor card eH 
rd (x) } 
64 Kbyte memory card estat. 
DLV113 4 line termina) card peat aty 
MRV11C ROM card (bootstrap) saa 
ACC 1822 interface with DMA uF 
Interlan NI2010 QBUS Ethernet controller satel 
BBN FNV11 Fibernet interface ft 
MDB backplane and power-supply. 


Table 14.3 Gateway Configuration 
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15 Virtual Local Network a 
15.1 Purpose and Scope Ya 
“ “ 
The Cronus Virtual Local Network (VLN) provides’ interhost t rant 
; message transport in the Cronus Distributed Operating System. eat A 
; The VLN client interface is available on every Cronus host. ee a 
' Client processes can send and receive messages using specific, 
broadcast, or multicast addressing. ao 
: The VLN stands in place of a direct interface to the ate 
‘ physical local network (PLN). This additional level of ie 
; abstraction is defined to meet two major system objectives: se 


oe 


o Compatibility. The VLN is compatible with the Internet 
Protoco)] (IP) and with higher-level protocols, such as the 
Transmission Cortrol Protocol (TCP). based on IP. 


~~ es 


o Substitutability. Cronus software built above the VLN is 


’ dependent only upon the VLN interface and not its 

implementation. It 1s possible to substitute one physical 
q local network for another provided that the VLN interface 
a 


specification 1s satisfied. 


This description assumes the reader 18S familiar with the 
concepts and terminology of the DARPA Internet Program. reference 
[NIC 1982] is a compilation of the important protoco] 
specifications and other documents. Documents in [NIC 1982] of 
special significance here are [Postel 19814] and [Postel 1961b]. 

The Advanced Development Mode] ADM will be connected to the 
ARPANET, and it is important that the ADM conform to the standard 
and conventions of the DARPA internet community. In addition, a 
large body of software has evolved, and continues to evolve, in 


ee ee oe 


be % 


‘ the internet community. For example, protocol compatibility 
permits Cronus to assimilate existing software components 

providing electronic mail, remote terminal access, and “file 
transfer. 


The substitutability goal reflects the belief that different 
instances of Cronus wil] use different physical local networks. 
Substitution may be desirable for reasons of cost, performance, 
or other properties of the physical local network such as 
mechanical and electrical ruggedness. 
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Figure 1 shows the position of the VLN in the Jowest layers Mt RY 

of the Cronus protocol] hierarchy. The VLN interface iat 
specification leaves programming details of the interface and oy 
host-dependent issues unspecified. The precise representation of stats 
the VLN data structures and operations will vary from machine to giet 
machine, but the functional capabilities of the interface are the meen 
same regardless of the host. instal 


| Transmission | User | | 
{| Control | Datagram |... | 
| Protoco] | Protocol | | 


| Internet Protocol | 
l (IP) \ 


| Virtual Loca) Network | 
| (VLN) | 


| Physical Loca!) Network | 
| (PLN, e.g. Ethernet) | 


oe ae ee a er rr 


Figure 15.1 Cronus Protoco! Layering 


The VLN 1s completely compatible with the Internet Protoco] 
as defined 1n [Postel 1981b]. No changes or extensions to IP are 
required to implement IP above the VLN. 
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: 15.2 The VLN-to-Client Interface neha 
' ial 
The VLN layer provides a datagram transport service among 
hosts in a Cronus cluster, and between these hosts and other 
; hosts in the DARPA internet. The hosts belonging to a cluster a 
i are attached to the same physical local network. Communication ain 
f with hosts outside the cluster is achieved through jnternet Mnf 
- gateways, shown in Figure 2, connected to the cluster. The VLN ¥ ue ‘ 
routes datagrams to a gateway if they are addressed to hosts 1 
i outside the cluster, and delivers incoming datagrams to the THN 
"J appropriate VLN host. A VLN is a network in the internet, and Matte o 
‘ thus hes an internet network number(27). 
' 
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n (27). The network numbers for the PLN and VLN may be the same or 

4 : different. lf the numbers are different, the gateways are 

) somewhat more complex. Either approach is consistent with the 

: internet model]. 
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Figure 15.2 A Virtual Local] Network Cluster 


The VLN interface will have one client process on each host, 
normally the host's IP implementation. The VLN performs no 
multiplexing/ demultiplexing function. 


The structure of messages which pass through the VLN is 
identical to the structure of internet datagrams. The VLN 
definition assumes that there 1s a well-defined representation 
for internet datagrams on anv host supporting the VLN interface. 
The argument name "Datagram" in the VLN operation definitions 
below refers to this well-defined but host-dependent datagram 
representation. 


The VLN guarantees that a datagram of 576 or fewer octets 
can be transferred between any two VLN clients. Although larger 
datagrams may be transferred between some client pairs, clients 
should avoid sending datagrams exceeding 576 octets unless there 
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3 ‘ ROR rite 
is clear need to do so. The sender must be certain that at} matte 
hosts involved can process the oversized datagrams. h 
nl 
The interna] representation of an VLN datagram is_ not noe 
included in the specification, and May be chosen for fee 
implementation convenience or efficiency. ce 
ft, 
a 
Although the structure of internet and VLN datagrams 15s 4 
identical, the VLN-to-client interface places its own : 
interpretation on internet header fields, and differs from the a 


IP-to-client interface 1n significant respects. 


1. The VLN laver uses only the Source Address. Destination ea 
Address. Total Length. and Header Checksum fields in the wi tty. 
internet datagram: other fields are accurately transmitted spe 
from the sending to the receiving client t oO 

« Raat 

2. Internet datagram fragmentation and reassembly is _ not ean 
performed in the VLN leyer. nor does the VLN laver ft we, 
implement any aspect of internet datagram option Bor Peer 
processing. wi iit 

3. At the VLN interface, a special interpretation 1s placed a 
upon the Destination Address in the internet header, which mi nat 
allows VLN broadcast and multicast addresses to be encoded ae 
in the internet address structure. . 


4. With high probability, duplicate delivery of datagrams sent 
between hosts on the same VLN does not occur. 


5. Between two VLN clients S and R in the same Cronus cluster, 
the sequence of datagrams received by R is a subsequence of 
the sequence sent by S to R; a stronger sequencing property 
holds for broadcast and multicast addressing. 


In the DARPA internet. an internet address is defined to be 
@ 32-bit quantity that is partitioned into two fields, a petwork 
BMDumber and a loca) address. VLN addresses’ share this basic 
structure, but it attaches special meaning to the local address 


ee me, 


at ated 





field of a VLN address. 
Each network is assigned a class (A. B. or C), and a network 
number . The partitioning of the 32-bit internet address into 
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network number and local address fields as a function of the Waistditt! 
class of the network 1s shown in Table 1. ae AAS 
; x wt 
4 
‘ Width of Width of rratuen’ 
4 Network Number Local Address pet 
‘ niet 
‘Z a 
Class A 7 bits 24 bits :.. 
‘ Me 
) Class B 14 bits 16 bits / Ky x 
Valet 
‘ Class C 21 bits 8 bits attains 
: uotaal eee 
- @ 
. iy 
X Table 15.1 Internet Address Formats sa 
t) } a 
5 Weneetty 
Hy ayaa, 
: The bits not included in the network number or local address neatly 
. fields encode the network class. e.g., a 3 bit prefix of 110 Hy tt vids! 
designates a class C address (see [Poste] 19814]). e 
san 
The interpretation of the local address field 1s the ele! 
responsibility of the network. For example, in the ARPANET the SO 
fl local address refers to a specific physica] host. VLN addresses, ROMY 
: in contrast, may refer to ali hosts (broadcast) or groups of SOOKE tts 
hosts (multicast) 1n a Cronus cluster. as wel] as specific hosts : 
* inside or outside of the cluster. Specific, broadcast. and at 
; multicast addresses are all encoded in the VLN local address " mati 
: field (28). The meaning of the local address field of a VLN aa 
‘ address 1s defined in Table 2. KRY ae 
atatetat, 


~——e me: 


: (28). The ability of hosts outside a Cronus cluster to transmit 
5. datagrams with VLN broadcast or multicast destination addresses 
: into the cluster may be restricted by the cluster gateway(s), for 


reasons of system security. 
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ess Mode VLN Local Address Values Xb) ane 


a 
ca 
wteaatlat 

ay a*, Ch 
Specific Host oO to 1,023 : ratty 
te Oe DS t 
Multicast 1,024 to 65.534 ; ® 
Broadcast 65,535 > Pat 
Hate 
ef {) since 
ny att + 
COC 
Table 15.2 VLN Local Address Modes bata! stat atyt, 
e 
Say 

In order to represent the full range of specific, broadcast, and CoN 

Multicast addresses in the toca) address field, a VLN network eX 

should be either class A or class B inceaiest 

> mann 

The VLN does not attempt to guarantee reliable delivery of 7) 
datagrams, nor does it provide negative acknowledgements of t OREN 
damaged or discarded datagrams. It does guarantee that received masterly 
datagrams are accurate representations of transmitted datagrams. RAC 
ae 

y 
The VLN guarantees that datagrams wil! not replicate during “! N 
transmission. so each intended receiver, 4 given datagram given aR 
to the VLN by higher levels 1s received once or not at all(29). wnt 
OH 
Between two VLN clients S and R in the same cluster, the 


sequence of datagrams received by R 18 a subsequence of the 
sequence sent by S to R, that 1s datagrams are received in order, 
possibly with omissions. A stronger sequencing property holds 
for broadcast and multicast transmissions. If receivers Rl and 
R2 both receive broadcast or multicast datagrems D1 and De. 


either they both receive Dl before D2. or they both receive D2 
before D1. 


While a VLN could be implemented on a long-haul or virtual- 


(29). A protocol operating above the VLN layer (e.g., TCP) may 
employ a retransmission strategy; the VLN layer does nothing to 
filter duplicates arising in this way. 
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circuit-oriented PLN, these networks are generally ill-suited to ht 
the task. The ARPANET, for example, does not support broadcast MRR 
or multicast addressing modes, nor does 1t provide the VLN eT 
sequencing guarantees. If the ARPANET were the base for a VLN yt) 
implementation, broadcast and multicast would have to be ~ 
constructed from specific addressing. and a_ network-wide eats, 
synchronization mechanism would be required to implement the peg sat 
guarantees. Although the compatibility and substitutability Kala’ 
benefits might still be achieved, the implementation would be : as 
costly, and performance poor. peat, 


A good implementation base for a Cronus VLN would be a a! 
high-bandwidth local network with all or most of these A 
characteristics. ‘at 





‘ 1. The ability to encapsulate a VLN datagram in a single PLN etd 
N datagram. SNS 


wie 
> 2. An efficient broadcast addressing mode. Kon 


3. Natural resistance to datagram replication during . ©. 
transmission. earitegets 


he) 
4. Sequencing guarantees like those of the VLN interface. tn! 

¥ 
u 5. A strong error-detecting code (datagram checksum). OAK 


" Good candidates include Ethernet, the Flexible Intraconnect, and ati, 
" Pronet, among others. 7 
b] 
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15.3 A VLN Implementation Based on Ethernet J 


"2 
yy 
5 
’, 


The Ethernet local network specification 1s the result of a gt aytiygt 
collaborative effort by Digital Equipment Corp., Intel Corp., and at, 
Xerox Corp. The Version 1.0 specification [DEC 1980] was Reed 
released in September 1980. Useful background information on the ARN eM 
Ethernet internet model is supplied in [Dela 1981}. 
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Wi The addresses of specific Ethernet hosts are arbitrary 48- RY nite 
" bit quantities, not under the control of the DOS. The VLN 

‘s implementation must map VLN addresses to specific Ethernet 

\ addresses. The mapping can not be maintained manually in each 
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VLN host, because manual procedures are too cumbersome and 
error-prone for a local network with many hosts, each of which 
may join and leave the network frequently. A protocol is 
described below which allows a host to construct the mapping 
dynamically, beginning only with knowledge of its own VLN_ and 
Ethernet host addresses. 


An internet datagram is encapsulated in an Ethernet frame by 
placing the internet datagram in the Ethernet frame data field. 
and setting the Ethernet type field to "DoD IP", as shown in 
Figure 3. 


The Ethernet octet ordering 1s required to be consistent 
with the IP octet ordering. If IP(i) and IP()) are internet 
datagram octets and i<), and EF(k) and EF({1) are the Ethernet 
frame octets which represent IP(i) and IP(j) once encapsulated, 
then k<]l. Bit orderings within octets must also be consistent. 


Each VLN component maintains @ virtual-to-physical address 
map (the VPMap) which translates a 32-bit specific VLN host 
address to a 48-bit Ethernet address. The VPMap data structure 
and the operations on it wil) implemented using hashing 
techniques. 


Each host controller has an Ethernet host address (EHA) to 


which it responds. The EHA 1s determined by Xerox and the 
controller manufacturer. In addition. the VLN assigns a 
multicast-host address (MHA) to each host. This multicast 


address iS constructed from the local host portion of the 
internet address. . 


When the VLN client sends a datagram to a specific host, the 
local VLN component encapsulates it and transmits it without 
delay. The Source Address in the Ethernet frame is the EHA of 
the sending host. The Ethernet Destination Address is formed 
from the destination VLN address in the datagram, and 1s either: 


o the EHA of the destination host, if the sending host knows 
it, or 


o the MHA formed from the host number in the destination VLN 


address, as described above, if the sending host does not 
know the EHA coresponding to the host number. 
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ky 0 1 2 3 ain 
a 01234567890123456789012345678901 

fe tat ta tot ttt ttt t titi ttt titi pip itt titegetete tate petet 

ig | Destination Address | 

Hotata titi ta tat ito ta tet a tat i tata ta tipi tipo titi ti tate tigetatotet 


|} Destination Address (contd.) | Source Address | 

“y +—4+—4-—4-4-4- 4-4-4 -4-4-4-4- 4-4-4 4 4-4-4 4-4-4 4-44 4 4 pt pat 

Hy) | Source Address (contd. ) | 

on +4 - 4-4-4 4-4-4 4-4 $$$ $$ $$ 4 $$$ $4 $$ tpt et 
i | Type ("DoD IP") | 


+-t+—t+-+-4-4-4+-+-4+-4-4+-4+-4t-+-4-4-4 


N +—4-+~4+-—4-4 -4+-4-4-4-4-4-4-4-4-4-4 
4 |Version| IHL [Type of Service| 
x +—4—4+-4- 4-4 -—$ 4-4 $$ $$ $4 pp tt tt 4 titan 
“ | Total Length | Identification | 
we +4 — 4-4-4 4-4-4 4-4-4 4-4 4-4 4 4 4 $$ $$ 4 $$ ttt 
|Flags | Fragment Offset | Tame to Live | Protocol 
“4 +ot+—4+—-4$-4-4-4-4-4 4-4-4 4-4-4 4-4-4 4-4 4-4-4 4-4-4 = $$ $$ + H+ 
“a | Header Checksum | Source Address | 
H totitm titan cto tata ta taet itt tata titan t it iti t—t—t—t 4-4-4 
my | Source Address (contd. ) | Destination Address | 
ue taHt—t—t—t+—4+—t—+—4+— 4-4-4 4 $4 $4 4 $$$ ttt ttt tt tt 
| Destination Address (contd.) | 
rm + t—4+—4+—4+— 4-4-4 4-4-4 4+ $$ - $4 
A 
ms +—t—4+—4+—4-— ttt — ttt ttt ttt 
o | | 
Ny t—+-+-—+-4-4-4-4+-4+-4-4+-4+- 4-4-4 4-4 4-4-4 4-44 + 4-4-4 $$ 4 $4 
| Data | 
"eV +—4—-4+-4+-4+-4-4+-4+-4-4-4-4-4 4-4-4 4-4-4 4-4-4 $$$ $$$ $$ 
Re 
"i . 
A +—4+-4+-+-4-4-4+-4-4+-4-4- 4-44-44 4-4-4 4-4-4 4 $$$ - 4 4-4-4 $$ 
te | | 
+ —4+—4—4+-4~—4—4- 4-4-4 4-4-4 = 4-4-4 4 $$ $= 4 $$$ $$$ $$ HH 
at 
ae $—4+— 4-4 = 4-4 4-4-4 - 4-4-4 - 4 + 4-4 - $$ $$ 4-4 4444-44 4 $$ $4 HH 
‘ | Frame Check Sequence | 
i +—+-4+ 4-4-4 — 4-4-4 4-4-4 -4 4-4-4 4-4-4 $= 4 $= $$ $$ $$ $+ 4+ 4 + 


ox Table 15.2 An Encapsulated Internet Datagram 


vd -~208- 


rig 9 
* an an ni oe iin ‘a wel a A) veri 


COON NENA 


SONG r wan 
SR A RO eh ere 
nna we ei RS ata atat sta we ati x) ral af masta waite Ra 0 ry eat tiet ea Ain 




























ae ee 


int ne wan ase 


ao 


seve a 


een es ge Ure ER bail a att 


Sy mite 

SS 

mJ 

FRY 

. x ~ y 

“ a 

ARKH 

te x 

SAT 

. 

seiteretaet 

tlt 

Ka 

marta! a 

_ saan 

so a 'Y 

ey 

When a VLN component receives an Ethernet frame with type ma mit 

"DoD IP” it decapsulates the internet datagram and delivers it oe wae 

to its client. If the frame was addressed to the EHA of the i 
receiving host, no further action is taken. lf the frame was 
addressed to the MHA of the receiving host, the VLN' component 
broadcasts an update for the VPMaps of the other hosts. The 
other hosts can thenuse the EHA of this host for future traffic. 


If the MHA 


1s represented as 


it has the form. 


A 


BC 


D E F 


09-00-08-00-hh-hh 


A 1s the first octet transmitted, 
E and F contain the host 


E 


and F the 


local address. 


F 


000000hh_ = hhhhhhhh 


The type field of the Ethernet frame containing 
and the format of the data octets 


is 
is: 


0 


““Cronus 


1 
MSB 


VLN" 


01234567 


t 
LSB 


1 2 


89g90123456789012345 67 890 1 

$—4+4—4- 4-4-4 4-4 $$ pt patito te to titait tata titi titi tet—tetat 
| Subtype ("Mapping Update”) | 
t—4+-+-4-4-4-4-4+- 4-4-4 $4 ttt titi ttitit—4—4 4-4-4444 


| Host VLN Address (contd. ) | 
+—4+-4+-4+~-4-4+-—+4-4+-4-+-4-4-4+-4+-4+-4-4+ 


When a local VLN component receives an Ethernet frame 
VLN” 


“Cronus 


Seen 


& sequence of octets 


last. 


Host VLN Address | 


in hexadecimal, 


The two octets 


the update 
in the frame 


3 


with type 





and subtype "Mapping Update” it performs a 
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StoreVPPair operation using the Ethernet Source Address field and 
the host VLN address sent as frame data 


A VLN datagram wi]] be transmitted in broadcast mode if the 
specifies the VLN broadcast address (loca! address = 65,535. 
decimal) as the destination. The receiving VLN component merely 
decapsulates and delivers the VLN datagram. 


The implementation of multicast addressing 1S more complex. 
Each host defines the number of multicast addresses which can be 
simultaneously “attended” (listened to). This mumber is a4 
function of the particular Ethernet controller hardware and of 
the resources that the host dedicates to multicast processing. 
The VLN protocol permits a host to attend any number of multicast 
addresses. from 0 to 64,511 (the entire VLN multicast address 
space). independent of the controller in use. 


It 1s possible to implement the VLN multicast mode using 
only the Ethernet broadcast mechanism. Every VLN host would 
receive and process every VLN muJticast. discarding uninteresting 
datagrams. More efficient operation 1s possible 1f some Ethernet 
multicast addresses are used. and 1f the Ethernet controller has 


multicast recognition which automatically discard misaddressed 
frames. 


There 1s no standard for multicast recognition. The 3COM 
Model 3C400 controller performs no multicast address recognition. 
Tt passes all multicast frames to the host for further 
processing. The Intel Mode] iSBC 550 controller permits the host 
to register a maximum of 8 multicast addresses with the 
controller, and the Interlan Model NM10 controller permits a 


maximum of 63 registered addresses. 


A VLN-wide constant, Multicast_Registered. 1s equal to the 
smallest number of Ethernet multicast addresses’ that can be 
Simultaneously attended by ali hosts in the VLN. A network 
composed of hosts with the Inte} and Interlan controllers 
mentioned above. for example, would have Multicast_Registered 


equal to 7 (30), a network composed only of hosts with 3COM Model 
3C400 controllers would have Multicast_Registered equal’ to 


(30), Multi_Registered is 7, rather than 8, because one multicast 


slot in the controller is reserved for the host's MHA. 
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64,511, since the controller itself does not restrict the number Magia 
of Ethernet multicast addresses to which a host may attend (31). aataattt: 


A mapping is defined which translates the VLN multicast ay 
address to an Ethernet multicast address. The first pay! 
Multicast Registered VLN multicast addresses are assumed to be ban 
attended by each host. The local address portion of the internet KIA ANS 
address of a VLN multicast channel is a decimal integer M in the tees, 


range 1,024 to 65,534. ® 
WYO OD 
misiaitay 
1. (M - 1,023) <= Multicast_Registered. In this case, the Ky 
Ethernet multicast address 15s ite! 
Tetee st tt 
Wt, t 
09-00-08-00-mm—mm parent 

e 
2. (M - 1,023) > Multicast_Registered. The Ethernet broadcast ac KH 
address 1s. used. A VLN component which attends VLN eemartiel 
multicast addresses in this range must recelve all LH iy 
broadcast frames, and select those with VLN destination Sa 
address corresponding to the attended multicast address. me fas! 

®@ 
Delivered datagrams ere accurate copies of transmitted Met 
datagrams because VLN components do not deliver datagrams with pital? 
invalid Frame Check Sequences. A 32-bit CRC error-detecting code rueactot 
1s applied to Ethernet frames. seated tal! 
RO 

Datagram duplication does not occur because the VLN layer e 
does not perform retransmissions, the primary source of Rlehigdigite 
duplicates in other networks. Ethernet controllers do perform pnatstit 
retransmission as a result of collisions on the channel, but the : cea 
collision enforcement mechanism or "jam" assures that no ¢ af 


controller receives a valid frame if a collision occurs. 


The sequencing guarantees hold because mutually exclusive 
access to the transmission medium defines a total ordering on 
Ethernet transmissions, and because a VLN component buffers a}} 
datagrams in FIFO order. 


(31). For the Cronus Advanced Development Model, 
Multicast_Registered 1s currently defined to be 60. 
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15.4  VLN Operations 


There are seven functions defined at the VLN interface. An 
‘ implementation of the VLN interface has wide latitude in the 
presentation of these operations to the client; for example, the 

; functions may or may not return error codes. 


The functions are to occur synchronously or asynchronously 
with respect to the client's computation. We expect that the 
a ResetVLNinterface., MyVLNAddress. SendVLNDatagram, 
: PurgeMAddresses, AttendMAddress, and IgnoreMAddress operations 
will be synchronous with respect to the client. 
ReceiveVLNDatagram will usually be asynchronous, that is, the 
client initiates the operation, continues to compute, and at some 
later time 1s notified that a datagram 1s available. 


ResetVLNInterface({ ) 


ae 


The VLN for this host 1s reset. For the Ethernet 
implementation, the operation ClearVPMap is performed, 
and @ frame of type “Cronus VLN" and subtype “Mapping 
Update” 1s broadcast. This operation does not affect the 
4 set of attended VLN multicast addresses. 
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My VLNAddress() 


Returns the VLN address of this host. 


BS 
na On 
SendVLNDatagram( Datagram) Det ; 
a atte’ 
; Mate ' 
p When this operation completes, the VLN layer has_ copied RYAKK y 
q the Datagram. The transmitting process cannot assume that e 
the message has been delivered when SendVLNDatagram 
‘ 
! completes. antes 
ata 
' ReceiveVLNDatagram( Datagram) Ly: 
t 
4 
{ When this operation completes, Datagram is a 
representation of a VLN datagram which has not previously 


received. 
PurgeMAddresses() 


When this operation completes. no VLN multicast addresses 
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are registered with the local VLN component. : alas 
ARK 
AttendMAddress(MAddress) 


If this operation returns True then MAddress, which must 
be a VLN multicast address, is registered as an alias for 
this host, and messages addressed to MAddress by VLN 
clients will be delivered to the client on this host. 


IgnoreMAddress(MAddress ) aaTast tanh 
: ’ cate 


Hy tain 


When this operation completes. MAddress 1s not registered 


as a multicast address for the client on this host. 


Pe. ae a 


Whenever & Cronus host comes up, ResetVLNInterface and 
PurgeMAddresses are performed on the VLN. A VLN component may 
depend upon state information obtained dynamically from other 
hosts, and there is a= possibility that incorrect information 
might enter a component's state tables. A cautious VLN client 
could call ResetVLNIinterface periodically to force the VLN 
component to reconstruct the tables. 


ew te 


¥ A VLN component wi]] limit the number of multicast addresses 
to which 1t will simultaneously attend, if the client attempts to 
register more addresses than this, AttendMAddress will return 
False with no other effect. 


The VLN layer does not guarantee buffering for datagrams at 
either the sending or receiving host(s). It does guarantee that 
a SendVLNDatagram function performed by a VLN client wil) 
: eventually complete; this implies that datagrams mey be lost if 
: buffering is insufficient and receiving clients are too slow. 
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16 Generic Computing Element Operating System 


One of the more important Cronus hardware components is_ the 
Generic Computing Element (GCE). Prior to its introduction to 
the Cronus DOS project. CMOS was under development at BBN as a 
real-time operating system for several types of communication 
processors, such as gateways and network terminal concentrators. 
In addition. a support environment for building and debugging 
CMOS applications is available under UNIX. CMOS provides’ the 
following basic operating system features: 


multiple processes 

interprocess communication/coordination 
asynchronous 1/0 

memory allocation 

system clock management 


oa0c00 


CMOS 1s an open operating system, that 31s, no distinct 
division exists between the operating system and the application 


program. The operating system 1s a coliection of Jibrary 
routines that can be easily extended by adding new routines and 
can be reduced by excluding unneeded routines. The programmer 


can directly access lower-level] interfaces. 


CMOS 1s a@ portable operating system. The use of the high- 
level language C 1s_ the principal factor in CMOS portability. 


Small size and simplicity are other important factors. The 
design minimizes the amount of machine-dependent code and 
segregates 1t. The 1/0 system design allows for easy replacement 


of device-dependent modules. 


The debugging environment is provided by XMD. a display 
oriented debugger based on the PEN editor. Al) of the features 
of the editor are available to the user in addition to the 
debugger specific commands. PEN is a multi-window editor with 
capabilities for manipulating multiple files and edit buffers. 
XMD displays a specia] configuration of windows that are 
appropriate to debugging. This configuration consists of a source 
window, a register display window, a breakpoint window, and a 
window for displaying variables. 
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A low-level debugger 1s resident in the target processor to 
interpret and execute commands sent to it over the communication 


path, currently a terminal line to the C70 UNIX host processor 
where XMD 1s running. 


Access to networks will be provided to CMOS applications 
from three levels. At the highest level, the user can open a TCP 
stream. The first application at this level will be Telnet and 
terminal concentration software. At the next level, there is an 
internet datagram service. This wil} be used to implement inter- 
process communication between hosts, as wel] as other standard 


internet protocols. The lowest level 1s the Ethernet local 
network interface. 


The communication module in XMD wil! be changed to use_ the 
Ethernet instead of a terminal line, increasing its flexibility 
and usefulness Downloading will be possible over the network. 
plus it will be easyer to debug multiple GCEs from one site. 


The internal device structure was changed was to give the 
1/O system more flexibility in dealing with the number of 
possible relationships between hardware devices and the 
interrupts generated by those devices. Without this change, the 


capability of writing simple device drivers’ for CMOS is 
compromised. 


A name service capability was added for the run-time binding 
of string names to processes and devices. The name space 18s 
hierarchical and there is a notion of absolute and relative 


pathnames. In the presence of some form of mass storage, the 
names can be made non-volatile. 
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